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PREFACE 


The  need  is  always  present  for  monitoring  the  current  performance  of  any  existing 
communication  network.  In  the  presence  of  limited  data  on  these  networks,  methods  are 
needed  to  monitor  a  network  over  time  in  order  to  determine  their  performance  and  detect 
any  degradation. 

The  purpose  of  this  study  is  to  identify  viable  performance  measures  for  a 
communication  network  derived  from  limited  data.  Then,  control  chart  procedures  will 
be  applied  to  these  performance  measures  in  order  to  monitor  them  over  time.  These 
control  chart  procedures  should  provide  a  straightforward  and  near-real-time  technique 
for  monitoring  the  performance  of  a  communication  network. 

I  thank  my  advisor.  Dr.  Edward  Mykytka,  for  his  excellent  guidance  through  the 
world  of  statistical  process  control  and  for  his  “smiling”  acceptance  of  my  “unorthodox” 
timeline  in  completing  this  thesis.  I  also  thank  my  reader.  Dr.  Yupo  Chan,  for  his  help  in 
understanding  commimication  networks  and  for  his  insights. 

Finally,  I  thank  my  wonderful  husband.  Franco,  for  being  “Mr.  Mom”  on  quite  a 
few  occasions  and  for  the  enormous  support  he  gave  me.  I  also  need  to  thank  my  newest 
“addition”,  Dominic,  who  completed  this  Master’s  degree  right  along  with  me  and  is 
probably  an  expert  in  control  charts  at  the  age  of  5  months. 


Maureen  “Mo”  Borgia 
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ABSTRACT 


This  study  investigates  the  application  of  statistical  process  control  methods  to 
monitoring  the  performance  of  a  communication  network.  The  methods  applied  include 
four  different  types  of  control  charts.  The  literature  search  uncovered  only  one  previous 
study  that  used  a  control  chart  to  monitor  a  communication  network. 

Using  a  case  study  of  a  communication  network,  four  important  issues  for  proper 
control  chart  usage  are  emphasized.  These  issues  are:  proper  data  collection  rate  due  to 
autocorrelation,  proper  subgrouping  of  the  data,  ensuring  that  count  data  conforms  to  the 
assumptions  of  the  binomial  probability  model  before  implementing  p  or  np  control 
charts,  and  viability  of  using  subgroups  of  attribute  data  as  measurement  data  on  an  x-bar 
chart.  The  results  indicate  that  control  charts  are  indeed  a  viable  method  for  monitoring  a 
communication  network’s  performance  over  time,  especially  when  the  available  data  on 
the  network  is  limited. 
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IDENTIFICATION  AND  EVALUATION  OF  MONITORING 
TECHNIQUES  FOR  THE  PERFORMANCE  OF  A 
COMMUNICATION  NETWORK 


1.  Introduction 


1.1  Background 

By  definition,  a  communication  network  is  represented  by  a  set  of  nodes  that  are 
interconnected  by  transmission  links.  The  nodes  can  be  user  terminals  or  switches  that 
pass  information  along  to  the  next  node  (7:3).  The  links  can  be  wire,  cable,  radio, 
satellite  links,  or  fiber  optics  (23:6).  Links  can  be  directed  or  imdirected.  On  a  directed 
link,  communication  can  only  take  place  in  one  direction  between  the  nodes  it  connects 
whereas,  on  an  undirected  link,  communication  can  take  place  in  both  directions  (7:3). 

The  sponsor  of  this  thesis  is  in  charge  of  monitoring  a  communication  network 
and  evaluating  its  performance.  The  sponsor  is  seeking  guidance  on  methods  to: 

Proactively  monitor  the  reliability,  availability,  and  degradation  of  networks...; 
account  for  their  performance  under  fully  automated  resource  conditions  through 
optimum  resource  utilization;  and  model  new  requirements  and  Level  of  Service 
Agreement  specifications  to  validate  the  performance  of  the  system  (20:1). 

Notional  failure  data  was  provided  which  represents  the  type  of  performance  information 

that  can  be  observed  from  the  communication  network.  This  data  consists  of  a  log  of 

times  that  specific  links  changed  state  (from  up  to  down  or  vice-versa).  An  example  of 

this  data  is  shown  in .  In  addition,  monthly  summaries  of  overall  network  performance 

containing  information,  such  as  the  average  down  time  for  any  link  and  mean  time 

between  failure  over  all  links,  were  provided. 
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Table  1.1  Example  Log  of  Network  Data 


Date 

Link  Number 

Failure  Time 

Up  Time 

FailureDuration 

Mario 

12 

06:11:15 

06:30:25 

00:19:10 

Mar  10 

42 

06:20:03 

06:25:15 

00:05:12 

Mario 

3 

06:21:00 

07:40:06 

01:19:06 

Mario 

21 

06:40:17 

07:05:27 

00:25:10 

Mar  10 

8 

06:45:48 

06:55:59 

00:10:11 

The  performance  of  the  communication  network  will  be  required  to  conform  to 
Level  of  Service  (LOS)  Agreements  that  are  to  be  developed  between  the  sponsor  and  the 
customers  of  the  communication  network.  However,  no  specified  monitoring  method  or 
technique  is  currently  in  use,  nor  are  LOS  specifications  and  agreements  currently 
defined.  As  stated  above,  the  general  categories  of  performance  measures  being 
considered  for  inclusion  in  these  LOS  Agreements  are  reliability,  availability,  and 
degradation.  Appropriate  measures  of  performance  in  these  categories  need  to  be 
identified  and  investigated  for  their  merits  towards  representing  the  sponsor’s 
communication  network.  These  measures  must  be  derived  fi'om  the  observable  data  of 
the  network.  Control  charts  were  suggested  by  the  sponsor  as  a  possible  technique  for 
monitoring  the  network’s  performance  and  are  the  focus  for  this  research. 
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1.2  Research  Objectives 


1.2.1  Overall 

The  primary  objectives  of  this  thesis  are  to  (i)  identify  and  evaluate  possible 
statistical  process  control  methods  (primarily  control  charts)  that  could  be  used  to 
proactively  monitor  communication  network  performance  over  time,  (ii)  automate  the 
best  of  these  methods  into  a  user  friendly  software  package,  and  (iii)  relate  .these  methods 
to  the  development  of  appropriate  LOS  Agreements. 

1.2.2  Specific  Requirements. 

The  following  specific  requirements  must  be  accomplished  in  order  to  complete 
this  research: 

1 .  Identify  and  evaluate  related  work  in  this  field. 

2.  Identify  possible  performance  measures  that  can  be  observed  and  used  to 
represent  the  reliability,  availability,  and  degradation  of  the  communication 
network  over  time. 

3.  Identify  appropriate  statistical  process  control  techniques  that  can  be  used  to 
monitor  each  candidate  performance  measure. 

4.  Develop  an  appropriate  model  that  could  be  used  to  describe  the  theoretically 
expected  performance  of  the  network  to  be  used  in  developing  appropriate 
‘standards’  for  control  charts  LOS  Agreement  specifications. 

5.  Identify  methods  for  relating  the  proposed  model  and  process  control 
techniques  to  potential  LOS  Agreements. 

6.  Evaluate  the  proposed  control  techniques  through: 

-  consideration  of  theoretical  properties  based  on  a  model  of  network 
operation  and  performance. 


1-3 


-  validation  and  demonstration  of  these  procedures  through  a  case  study  of 
network  performance,  especially  demonstrating  how  degradation  can  be 
monitored,  and 

-  since  data  from  the  actual  network  is  not  available,  development  of  a 
model  of  network  operation  from  which  simulated  data  can  be  observed. 

7.  Develop  EXCEL  spreadsheets  and  macros  for  implementing  the  proposed 
control  techniques  (8). 


1.3  Assumptions 

Based  on  discussions  with  the  sponsor  and  concurrence  with  other  research  in  this 
area,  the  following  assumptions  are  used  throughout  this  research  effort  unless  otherwise 
noted: 

1 .  Nodes  are  not  subject  to  failure.  [  The  precedence  for  this  assumption  was  set 
in  a  previous  thesis  effort  for  the  sponsor  by  Van  Hove  (27:10,  54-5). 
Networks  with  failing  nodes  can  be  modified  to  conform  to  this  assumption  by 
replacing  the  failing  node  with  two  reliable  nodes  connected  by  a  failing  link 
as  demonstrated  in  previous  theses  by  Yim  (30:10,49),  Gaught  (9:17,22), 
Jansen  (12:40)  and  Van  Hove  (27:55).] 

2.  Links  are  subject  to  total  failure  only,  i.e.,  they  are  either  ‘up’  or  ‘down’  and 
they  do  not  operate  in  a  degraded  condition.  Further,  a  link’s  failure  can  be 

due  to  any  cause  including  routine  maintenance.  [Total  failure  of  links  is  a 

common  assumption  used  in  previous  thesis  efforts  by  Yim  (30:3),  Gaught 
(9:3),  Jansen  (12:3),  and  Van  Hove  (27:5).] 

3.  Link  failures  are  independent.  [This  assumption  is  consistent  with  previous 
thesis  efforts  by  Yim  (30:17,51),  Gaught  (9:3),  Jansen  (12:3),  and  Van  Hove 
(27:10).] 
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4.  Links  are  directed  (one-way);  flow  is  permitted  in  one  direction  only.  [  This 
assumption  is  also  consistent  with  previous  thesis  efforts  by  Yim  (30:3), 
Gaught  (9:3),  Jansen  (12:3),  and  Van  Hove  (27:40,47).  This  assumption  only 
impacts  the  computation  of  the  number  of  paths  existing  between  a  source  and 
sink  node.  This  number  is  then  used  in  computing  certain  network 
performance  measures.  This  assumption  can  be  relaxed,  but  then  specific 
information  about  network  structure  and  protocols  implemented  is  required.] 

5.  Only  the  ‘status  history’  of  links  can  be  observed  from  the  network,  i.e.,  a  log 
of  times  for  link  status  changes  (up  or  down).  No  other  network  information  is 
available  such  as  ‘flows’  (amounts  of  information  transmitted  over  links  per 
time  interval),  ‘error  rates’  (proportions  of  transmitted  information  that  is 
correctly  received),  or  link  reliabilities.  This  assumption  is  consistent  with  the 
notional  data  and  monthly  summaries  provided  by  the  sponsor. 

5.  Changes  in  link  status  are  observed  and  recorded  in  real  time  but  are  reported 
to  a  ‘network  monitor’  only  at  300-second  intervals.  The  ‘network  monitor’  is 
that  entity  which  is  monitoring  the  network.  [This  assumption  is  consistent 
with  the  information  and  notional  data  provided  by  the  sponsor.] 

6.  The  network  is  assumed  to  perform  under  fully  automated  resource  conditions 
which  enable  it  to  optimally  use  its  resources.  Thus,  for  example,  if  a  link 
fails,  traffic  which  could  use  that  link  is  automatically  rerouted  to  an  alternate 
path  (if  available). 

7.  No  Level  of  Service  (LOS)  Agreements  currently  exist. 
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1.4  Scope 

Currently,  the  sponsor  monitors  the  network  in  terms  of  which  links  are  “up”  and 
“down”,  and  records  this  information  in  a  log  of  the  times  of  their  failure  and  repair.  This 
thesis  will  be  limited  to  an  examination  of  statistical  process  control  (SPC)  procedures 
that  can  be  applied  to  performance  measures  which  can  be  computed  from  this  available 
data.  Additionally,  the  performance  of  a  communication  network  can  be  monitored  from 
three  viewpoints.  First,  the  network  as  a  whole  can  be  monitored  by  aggregating 
measurements  and  readings  over  all  links  in  the  network.  Second,  the  network  can  be  . 
monitored  from  a  customer’s  perspective  by  monitoring  the  paths  between  the 
customer’s  source-termination  (s-t)  nodes.  Finally,  each  link  in  the  network  can  be 
individually  monitored  for  indications  of  degradation  or  failure. 
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2.  Literature  Review  and  Assessment 


This  chapter  presents  a  review  of  literature  applicable  to  the  use  of  statistical 
process  control  techniques  for  monitoring  communication  network  performance.  In 
addition,  the  performance  measures  applicable  to  the  sponsor’s  communication  network 
and  the  observable  data  are  identified.  The  literature  reviewed  covers  books,  current 
journals,  conference  proceedings,  and  theses.  Following  sections  will  discuss  commonly 
used  performance  measures,  performance  measures  applicable  to  the  sponsor’s  network, 
prior  theses  in  communication  network  performance,  statistical  process  control  (SPC) 
techniques,  and  prior  applications  of  SPC  techniques  to  communication  networks. 


2.1  Common  Performance  Measures 

There  are  certain  measures  of  communication  network  performance  that  are 
commonly  used  in  the  literature.  These  are:  time  delay,  reliability,  availability,  and  bit 
error  rate.  Each  is  discussed  below. 

2.1.1  Time  Delay. 

A  number  of  sources  identify  time  delay  as  an  important  measure  of  performance. 
Each  source  uses  different  names  for  this  delay,  such  as  end-to-end  time  delay  (23:22), 
average  time  in  system  for  all  messages  (7:91),  network  average  delay  (15:1 108),  and 
message  delay  (14:24)  but  they  all  have  the  same  meaning.  Unfortunately,  this  measure 
is  not  available  in  the  data  that  is  currently  observed  from  the  network  and,  hence,  time 
delay  will  not  be  used  as  a  performance  measure  in  this  thesis.  (If  time  delay  could  be 
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observed,  it  could  be  readily  monitored  using  the  variables  control  charts  discussed  in 
subsequent  sections.) 

2.1.2  Reliability. 

Another  common  and  useful  measure  of  communication  network  performance  is 
reliability  (12:7).  Reliability  is  defined  as  the  probability  that  a  system/component  will 
operate  without  degradation  and  be  able  to  perform  a  certain  mission  for  a  certain  length 
of  time  given  that  the  system/component  was  operating  initially  (19:43). 

The  reliability  of  a  specific  link  can  be  computed  theoretically  if  the  time  to 
failure  distribution  for  that  link  is  known.  In  particular,  if  the  time  to  failure  for  a  link 
can  be  modeled  as  a  random  variable  that  has  probability  density  function  (PDF)  f(x), 
then  the  reliability  of  that  link  can  be  computed  as  (21 :433-4): 

T 

p(T)  =  P[link  still  operating  at  time  T]  =  J/(j')<7.s 

0 

Then,  the  reliability  associated  with  a  p^lrticula^  path  composed  of  n  independent  links 
arbitrarily  numbered  1  through  n  is  given  by: 

'•(o=nA(o 

/=i 

where  Pi(T)  denotes  the  reliability  of  the  ith  link.  If  we  could  then  represent  the  portion 
of  the  network  connecting  a  source  node,  s,  to  a  termination  node,  t,  as  a  collection  of  k 
independent  and  parallel  paths,  then  the  reliability  of  that  portion  of  the  network  could  be 
determined  as: 

«(n=i-fl[i-o(o 

i=l 

where  rj(t)  represents  the  reliability  of  the  jth  path. 
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Unfortunately,  this  last  expression  is  not  generally  appropriate  for  most 
communication  networks  since,  even  though  individual  links  may  behave  independently 
of  one  another,  alternate  paths  may  share  common  links  and,  thus,  would  not  be  not 
independent.  In  such  situations,  the  reliability  of  a  particular  portion  of  the  network  can 
be  determined  by  first  developing  an  appropriate  structure  function  as  described,  for 
example,  in  (21 :4 12- 17).  This  development  is  omitted  here  since  the  approach  (i)  is 
straightforward  but  tedious,  (ii)  would  need  to  be  applied  uniquely  to  each  source  node¬ 
termination  node  pair,  (iii)  requires  link  time-to-failure  distributions  to  be  known,  and 
(iv)  provides  a  means  of  evaluating  the  expected  performance  of  the  system  but  has 
limited  value  for  monitoring  system  performance  over  time.  It  is  important  to  note, 
however,  that  such  a  system  reliability  approach  would  appear  to  provide  a  useful  and 
tractable  way  to  model  communication  between  particular  pairs  of  source  and  termination 
nodes. 

The  formal  definition  of  reliability,  however,  does  suggest  some  related 
performance  measures.  Although  these  do  not  directly  measure  reliability  per  se,  they  do 
provide  meaningful  measures  that  can  be  used  to  detect  changes  in  the  reliability  of  the 
network  or  its  components  (i.e.,  of  links,  paths,  or  collections  of  paths).  One  such 
measure  is  the  proportion  of  components  that  do  not  fail  over  a  specified  time  interval. 
Another  is  the  Mean  Time  To  Failure  (MTTF),  defined  as  the  expected  length  of  time  a 
component  successfully  operates  before  it  fails.  For  a  specific  link,  this  is  simply  the 
mean  of  the  time-to-failure  distribution.  A  closely  related  measure  is  the  Mean  Time 
Between  Failures  (MTBF),  which  is  the  Mean  Time  To  Failure  plus  the  mean  time  to 
repair  (MTTR).  Strictly  speaking,  however,  since  MTBF  explicitly  considers  the 
possibility  of  repair,  it  perhaps  should  be  classified  among  the  measures  of  availability 
which  follow. 
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2.1.3  Availability. 

Availability  is  defined  as  either  the  probability  that  a  system  is  functional  at  a 
given  time  or  the  proportion  of  time  that  a  system  is  functional  (25:41).  This  is  different 
from  the  definition  of  reliability  in  that  reliability  is  the  probability  that  a  system  will 
operate  without  degradation  for  a  certain  length  of  time  instead  of  at  a  given  time.  It 
implicitly  recognizes  that  components  are  repaired  once  they  fail.  Myers  and  others  list 
three  more  specific  definitions  of  availability: 

1 .  Instantaneous  availability.  The  probability  that  the  system  will  be  available 
[functional]  at  any  random  time  t. 

2.  Mission  availability.  The  proportion  of  time  in  an  interval  that  the  system  is 
available  for  use. 

3.  Steady-state  availability.  The  proportion  of  time  that  the  system  is  available 
for  use  when  the  time  interval  considered  is  very  large.  (19:49) 

One  common  equation  for  computing  steady-state  availability  is  (19:52): 

MTTF 

MTTF+MTTR 

where  MTTF  =  Mean-Time-To-Failure  and  MTTR  =  Mean-Time-To-Repair.  Kubat  gives 
a  definition  of  network  availability  that  agrees  with  mission  availability  as  defined  above 
(13:309): 

uptime  _  during  _  one_  cycle 
E^ycle_time} 

where  a  cycle  is  the  time  interval  of  interest. 


A  = 


E  Network 
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2.1.4  Bit  Error  Rate. 


One  final  measure  of  performance  that  is  indicated  in  the  literature  as  being 
important  is  the  bit  error  rate  (BER).  This  is  a  measure  of  how  many  bits  of  a  message 
are  received  in  error  divided  by  the  total  number  of  bits  received  (4:Ch  1,2).  BER 
provides  a  common  measure  of  network  degradation  but,  since  this  data  is  not  currently 
observable  from  the  sponsor’s  communication  network,  this  measure  will  not  be  used  in 
this  research  effort. 


2.2  Applicable  Performance  Measures 

In  the  preceding  section,  a  number  of  commonly-used  communication  network 
performance  measures  were  introduced.  Most  of  these  are  measures  of  theoretical  or 
expected  system  performance  which  require  knowledge  of  certain  system  characteristics, 
such  as  time-to-failure  distributions  for  each  link.  Although  these  measures  provide  a 
useful  means  of  describing  a  system,  they  are  not  directly  useful  for  monitoring  current 
network  performance.  Instead,  measures  that  can  be  computed  based  on  the  observed 
performance  of  the  network  are  required.  This  section  describes  the  particular  measures, 
or  quality  characteristics  of  the  communication  network,  that  will  be  used  in  this  research 
to  evaluate  network  performance. 

As  was  stated  in  Chapter  1,  the  communication  network  can  be  monitored  from 
three  different  viewpoints:  overall  network  performance  (aggregating  measurements  over 
all  links  at  a  system  level),  network  performance  for  a  given  customer’s  (s-t)  pair  (at  an  s- 
t  level),  and  individual  link  performance  (at  a  link  level).  Each  of  these  viewpoints  has 
performance  measure(s)  that  are  best  suited  to  them.  Remembering  the  sponsor’s  initial 
goal  to  monitor  the  reliability,  availability,  and  degradation  of  networks,  some 
appropriate  performance  measures  are  now  identified  to  accomplish  this  goal. 
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Since  no  information  is  available  about  the  processes  by  which  links  degrade  over 
time,  nor  can  this  be  directly  measured  from  the  data  available,  degradation  will  be 
monitored  indirectly  through  observation  of  performance  measures  related  to  reliability 
and  availability.  These  measures  can,  in  turn,  be  expected  to  reflect  any  degraded 
performance  of  the  network.  To  facilitate  this  indirect  monitoring  of  degradation,  it  is 
assumed  that  links  either  fail  more  often  or  remain  down  for  longer  periods  of  time  when 
they  are  in  a  degraded  state. 

2.2.1  Overall  Network  Performance. 

Since  the  link  failures  are  assumed  to  be  independent,  one  measure  of  overall  link 
reliability,  termed  p-up  in  this  study,  is  the  proportion  of  operating  links  that  are 
observed  at  a  given  instant  of  time  (specifically  at  the  300  second  reporting  interval 
described  in  Chapter  1): 

total _  operating _  links 
^  total _#_links 


As  network  performance  degrades,  this  measure  would  be  expected  to  show  a  decrease 
since  fewer  links  would  be  operating.  An  ‘opposite’  measure/proportion,  which  will  be 
termed  p-down,  can  also  be  calculated  at  any  instant  of  time  as: 


p  -  down  = 


totd_  down_  links 
toted  #  links 


This  measure  is  expected  to  increase  as  network  performance  degrades  since  more  links 
will  be  down.  Alternately,  the  number  of  links  up  or  links  down  at  any  instant  of  time 
could  also  be  used  as  a  performance  measure.  Links  down,  termed  DwnLnk,  will  be  used 
arbitrarily  in  this  study.  This  measure  is  expected  to  increase  with  network  degradation. 
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An  important  point  here  is  that  at  any  reporting  time,  all  that  needs  to  be  checked 
for  the  performance  measures  to  be  computed  is  the  status  of  each  link  (up  or  down). 
Failure  and  repair  times  are  not  used  in  the  above  measures  and  thus,  they  provide  a 
‘snapshot’  of  the  network  status  at  each  reporting  time. 

2.2.2  Network  Performance  for  a  Customer. 

A  measure  is  needed  to  express  network  performance  for  a  given  customer’s 
source-termination  (s-t)  node  pair.  This  measure/proportion,  which  will  be  denoted  as 
p-path,  can  be  calculated  every  reporting  time  as: 

#  _  operating  _  paths{s  -  /) 

P  #_total_paths{s-t) 

This  proportion  does  not  directly  measure  path  reliability  since  the  paths  are  not  all 
independent,  but  it  is  a  useful  indicator  of  (s-t)  network  performance  nonetheless.  As 
network  performance  degrades,  this  measure  would  be  expected  to  show  a  decrease  since 
fewer  links  would  be  operating  which,  in  turn,  should  cause  fewer  paths  to  be  operating. 
Here  too,  at  every  reporting  time  all  that  needs  to  be  checked  is  the  status  of  each  link 
which ,  in  turn,  is  used  to  determine  the  status  of  each  path.  Failure  and  repair  times  are 
not  used,  just  a  ‘snapshot’  of  the  network  at  each  reporting  time. 

The  communication  network  monitored  by  the  sponsor  is,  generally,  a  collection 
of  40  to  50  nodes,  each  connected  to  between  1  and  10  links.  As  such,  the  network  is 
expected  to  offer  at  least  a  moderate  number  of  alternate  paths  between  most  (s-t)  node 
pairs.  In  this  case,  any  degradation  in  link  performance  may  have  only  a  slight  to 
moderate  impact  on  overall  network  or  customer  (s-t)  performance.  This  small  impact 
may  be  difficult  to  detect  using  the  previous  described  ‘larger-scale’  network 
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performance  measures.  As  a  result,  a  proactive  monitoring  strategy  would  appear  to 
place  emphasis  on  monitoring  individual  link  performance  in  order  to  detect  and  correct 
‘low-level’  degradations  before  they  significantly  impact  overall  network  performance. 
For  this  reason,  the  bulk  of  attention  in  this  thesis  is  focused  at  the  individual  link  level. 
This  is  fortuitous  because  performance  at  this  level  is  also  the  easiest  and  most 
straightforward  to  monitor. 

2.2.3  Individual  Link  Performance. 

This  goal  requires  data  to  be  collected  for  each  link  individually.  A  common 
Availability  measure  can  be  calculated  for  each  link  as: 


Availability  = 


toted _  link_  uptime _  during _  one_  time_  interval 
total  interval  time 


for  intervals  of  1  hour  and/or  1  day.  Care  must  be  taken  in  choosing  the  cycle  length, 
since  a  cycle  length  shorter  than  the  mean  time  between  link  failures  would  not  produce 
an  accurate  calculation  due  to  lack  of  enough  (or  any)  representative  data  during  the 
interval.  This  will  be  demonstrated  explicitly  during  the  case  study.  Any  degradation  of 
an  individual  link’s  performance  can  be  expected  to  decrease  this  availability  measure.  A 
related  measure  or  proportion,  denoted  p-link,  can  also  be  calculated  by  computing  the 
proportion  of  reporting  times  during  an  interval  that  the  link  is  found  to  be  operating: 


p-link  = 


toted  _  times _  link _  is_  found _  operating _  per  _  interval 
toted  _  reports _  per  _  interval 
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for  intervals  of  1  hour  (12  reports  per  hour)  and/or  1  day  (288  reports  per  day).  Again, 
any  degradation  of  an  individual  link’s  performance  is  expected  to  decrease  this 
performance  measure. 

For  each  link,  the  Time  Between  Failures  (TBF),  Time  to  Failure  (TTF),  and 
Time  to  Repair  (TTR)  can  be  calculated  for  each  failure  from  the  log  of  failure  and 
repair  times.  Also,  each  link’s  cumulative  Mean  Time  Between  Failures  (MTBF), 
Mean  Time  to  Failure  (MTTF),  and  Mean  Time  to  Repair  (MTTR)  can  be  calculated 
over  all  past  failures  after  each  failure/repair  occurs.  For  these  measures,  a  degradation  of 
an  individual  link’s  performance  is  expected  to  decrease  TBF,  TTF,  MTBF  and  MTTF 
and/or  increase  TTR  and  MTTR.  Also,  from  the  above  measures,  another  availability 
measure,  c£ill  it  SSA,can  be  calculated  for  each  link  individually  after  each  repair  as: 

MTIF 

SSA  = - 

MITF+MITR 

This  is  a  steady-state  availability  measure  and  will  be  more  accurate  as  time  goes  on  (the 
MTTF  and  MTTR  measures  are  cumulative).  Since  this  is  a  steady-state  measure,  as  time 
goes  on  it  is  expected  that  changes  will  become  harder  and  harder  to  detect.  This 
expectation  will  be  investigated  in  the  case  study.  In  this  last  category  of  performance 
measures  just  described,  the  actual  failure  and  repair  times  are  used  in  addition  to  the 
‘snapshot’. 

2.2.4  Summary  of  Performance  Measures. 

Quite  a  few  performance  measures  have  been  identified  as  candidates  for 
representing  the  performance  of  a  communication  network.  These  performance  measures 
were  chosen  on  the  assumption  that  the  only  data  available  from  the  communication 
network  is  link  failure  times,  repair  times,  and  status  (up  or  down)  at  a  given  time.  These 
identified  measures  are  investigated  in  subsequent  chapters. 
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2.3  Previous  Theses  on  Communication  Network  Performance 

Previous  theses  are  investigated  in  order  discover  any  applicable  methodologies  or 
insights  that  will  aid  and  support  the  current  research  effort.  There  are  five  previous 
theses  on  communication  network  performance  that  were  accomplished  for  the  sponsor. 
These  were  accomplished  by  Yim  (30),  Bailey  (2),  Gaught  (9),  Jansen  (12),  and  Van 
Hove  (27).  Yim  modeled  the  expected  maximum  flow  of  a  network  to  determine 
optimum  investment  strategies  that  will  improve  stochastic  communication  network 
performance  via  arc  capacity  (30:2,25,92).  An  arcs  is  another  term  for  a  link  (7:3). 

Bailey  used  Monte  Carlo  simulation  to  find  the  expected  throughput  and  expected 
reliability  of  a  stochastic  communication  network  (2:1).  Gaught  built  on  Yim’s  work  and 
developed  further  investment  strategies  for  improving  stochastic  commimication  network 
performance  via  arc  capacity  and  an  additional  measure,  arc  reliability  (9:2,21).  Jansen 
investigated  the  tradeoffs  between  maximizing  throughput  and  maximizing  reliability  of  a 
stochastic  communication  network  (12:2-3).  Most  recently.  Van  Hove  developed 
stochastic  network  flow  models  of  a  communication  network  in  order  to  determine 
bounds  on  average  delay,  bit  error  rate,  throughput,  and  reliability  depending  on  the 
utilization  level  of  the  network  (27:xi). 

Although  these  theses  efforts  provide  means  of  modeling  the  performance  of  a 
network  over  time,  they  tend  to  be  focused  on  the  flow  of  information  through  the 
network.  As  such,  they  require  information  that  is  not  assumed  to  be  known  or 
observable  in  this  thesis  and,  thus,  appear  to  provide  little  relevant  basis  for  this  research. 

In  addition,  these  models  tend  to  represent  the  behavior  of  links  in  the  network  in 
a  somewhat  different  fashion  from  that  assumed  or  observed  in  this  research.  For 
example.  Van  Hove  defines  the  reliability  of  a  link  as  “the  proportion  of  time  a 
component ...  is  expected  to  be  functional”  which,  as  seen  previously,  is  also  a  measure  of 
availability  (27:10-1 1).  He  models  this  by  assuming  that,  within  a  specified  interval  of 
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time,  a  link  will  either  be  up  or  down  with  a  fixed  probability,  p,  that  it  will  be  up.  He 
implicitly  assumes  that  changes  in  state  occur  at  the  start  of  these  time  intervals  and 
explicitly  assumes  that  a  link’s  status  during  a  given  time  interval  is  independent  of  its 
status  in  any  other  time  interval.  Although  Van  Hove  does  not  advocate  any  particular 
duration  for  this  time  interval,  it  appears  to  be  small;  a  one  second  interval  is  used  within 
a  case  study. 

Although  this  structure  will  produce  a  modeled  link  that  is  up  the  correct 
proportion  of  time,  the  number  of  state  changes  it  undergoes  or,  equivalently,  the 
durations  of  its  up  and  down  times,  may  not  correspond  to  those  in  the  actual  system. 
One  way  to  see  this  is  to  recall  that  the  availability,  p,  for  a  given  link  can  be  determined 
from  information  about  its  MTTF  and  MTTR  via: 

p  =  {MTTF)  I  {MTTF  +  MTTR) 

Clearly  there  are  an  infinite  number  of  possibilities  for  MTTF  and  MTTR  that  could 
produce  the  same  value  of  p.  Hence,  Van  Hove’s  model  does  not  account  for  the 
particular  up  and  down  time  dynamics  of  the  link.  (This  behavior,  perhaps,  could  be 
modeled  by  relaxing  the  assumption  of  independent  time  intervals  and  explicitly 
recognizing  that  the  probability  that  a  link  will  be  up  in  a  given  time  interval  depends  on 
its  state  in  the  preceding  interval.) 


2.4  Statistical  Process  Control  (SPC)  Techniques 

Statistical  process  control  techniques,  especially  control  charts,  are  the  primary 
techniques  under  investigation  in  this  thesis  to  monitor  and  evaluate  the  performance  of 
the  sponsor’s  communication  network.  Numerous  sources  discuss  the  various 
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techniques,  or  methods,  of  SPC.  Two  prominent,  and  pretty  much  all-encompassing, 
books  on  these  methods  are  by  Montgomery  (18)  and  Ryan  (22).  These  two  sources 
overlap  quite  a  bit,  hence  I  will  mainly  cite  from  one  of  them  and  use  the  other  to  cover 
any  gaps.  Montgomery  defines  SPC  as,  “a  powerful  collection  of  problem-solving  tools 
useful  in  achieving  process  stability  and  improving  capability  through  the  reduction  of 
variability,”  and  lists  seven  major  tools  of  SPC  (18:101): 

1.  Histogram 

2.  Check  sheet 

3.  Pareto  chart 

4.  Cause  and  effect  diagram 

5.  Defect  concentration  diagram 

6.  Scatter  diagram 

7.  Control  chart 

Each  of  these  tools  will  be  described  below,  and  their  relevance  to  this  research  will  be 
established. 

2.4.1  Histogram. 

A  histogram  is  a  graph  used  for  looking  at  the  raw  data  collected  from  a  process. 
The  observed  frequencies  are  plotted  against  the  observed  values  and  facilitates  the 
display  of  three  properties  of  the  data: 

1.  Shape 

2.  Location,  or  central  tendency 

3.  Scatter,  or  spread 

These  properties  provide  insight  into  the  process  from  just  the  raw  data  (18:24).  Since 
the  sponsor  wants  techniques  to  monitor  the  network  over  time,  this  procedure  is  not 
appropriate  and  will  not  be  used. 
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2.4.2  Check  Sheet. 

A  check  sheet  is  useful  in  collecting  historical  or  current  operating  data  about  the 
process.  It  summarizes  the  data  that  is  collected  (types  of  defects  for  example)  by 
categorizing  and  totaling  the  data.  A  time-oriented  summary  is  useful  in  identifying 
trends  or  other  important  patterns  in  the  data  collected  (18:118).  This  type  of  data  (types 
of  failures,  etc.)  is  not  available,  hence  this  procedure  will  not  be  used. 

2.4.3  Pareto  Chart. 

A  Pareto  chart  is  “simply  a  frequency  distribution  (or  histogram)  of  attribute  data 
arranged  by  category”(l  8: 120).  Just  as  in  the  histogram  discussed  earlier,  the  frequency 
of  each  observed  attribute  (like  values  in  the  histogram)  are  plotted  against  the  observed 
attribute  types.  The  difference  here  is  that  the  observed  attributes  are  not  numerical 
values  as  in  the  histogram.  They  are  qualitative  instead  of  quantitative.  This  procedure, 
like  the  histogram,  does  not  monitor  the  data  with  respect  to  time,  hence,  it  will  also  not 
be  used. 

2.4.4  Cause  and  Effect  Diagram. 

Once  a  defect  has  been  identified  in  a  process  the  cause  and  effect  diagram  is  used 
as  a  trouble-shooting  aid  to  find  possible  causes  of  the  defect .  It  is  simply  a  pictorial 
diagram  showing  categories  of  causes  and  enumerated  possible  causes  contained  in  each 
category  (18:121-4).  Once  again,  this  type  of  data  (causes  for  failures)  is  not  available  so 
this  procedure  will  not  be  used. 

2.4.5  Defect  Concentration  Diagram. 
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This  diagram  is  a  pictorial  representation  of  the  actual  unit  that  is  produced  by  the 
proeess.  The  defects  are  drawn  on  the  unit  in  order  to  determine  if  physical  location  of 
the  defeet  can  provide  insight  into  the  eause  of  the  defeet  (18: 124).  This  pietorial 
procedure  is  not  applicable  since  simply  showing  link  failures  on  a  diagram  does  not 
provide  much  insight  as  to  the  eause  of  the  failures. 

2.4.6  Scatter  Diagram. 

The  scatter  diagram  is  used  to  identify  potential  relationships  between  two 
different  variables  in  the  process.  Data  must  be  collected  on  the  two  variables  and  then 
plotted  against  each  other.  The  resulting  plot  is  then  evaluated  for  any  indieated  patterns 
(i.e.  slope,  curvature,  etc.)  (18:125).  This  procedure  is  potentially  useful  if  there  is  reason 
to  believe  that  two  of  the  performanee  measures  are  correlated.  However,  this  procedure 
is  only  used  to  identify  potential  relationships,  not  to  indicate  a  cause.  The  depicted 
relationship  could  be  caused  by  another  measure  of  something  completely  different 
(18:126). 

2.4.7  Control  Chart, 

A  control  chart  is  a  graphical  display  of  some  measured  characteristie  of  a  process 
that  is  plotted  over  time.  The  eenter  line  on  the  ehart  is  the  average  value  of  the 
eharacteristic.  The  two  other  lines  on  the  chart,  one  above  and  one  below  the  center  line, 
are  the  upper  and  lower  control  limits  (UCL  and  LCL  respectively).  These  limits  are 
ehosen  sueh  that  nearly  all  of  the  characteristic  points  will  fall  between  them  when  the 
process  is  “in  control.”  When  a  point  plots  outside  these  limits,  this  is  evidence  that  the 
proeess  is  “out-of-control”  and  an  investigation  is  required  to  find  the  cause  of  this 
behavior  in  the  process.  This  eause  is  ealled  an  assignable  eause  (18:103). 
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Assignable  causes  are  sources  of  process  variability  that  are  other  than  the  chance  causes 
(background  noise)  inherent  in  the  process  (18:102).  A  sample  control  chart  is  shown  in 
Figure  2.1. 


Sample 

Characteristic 


UCL 

LCL 

1  I  I  I  I . I  I  I  I  I 

Sample  number  or  Time 

Figure  2.1  Sample  control  chart 


Center  Line 


There  are  numerous  types  of  control  charts  that  are  used  to  display  different  types  of 
characteristics.  The  two  main  categories  of  these  types  of  control  charts  are:  Control 
Charts  for  Attributes  and  Control  Charts  for  Variables.  These  different  types  of  charts  are 
described  next. 


2.4.8  Control  Charts  for  Attributes.  Attributes  are  characteristics  of  a 
process  that  cannot  be  conveniently  represented  numerically.  An  example  of  this  type  of 
characteristic  is  the  status  of  a  link  being  ‘up’  or  ‘down’.  Three  widely  used  attributes 
control  charts  are  the  p  chart,  c  chart,  and  u  chart  (18:147). 

2.4.8.1  P  chart.  This  is  also  called  a  control  chart  for  fraction 
nonconforming  .  The  population  fraction  nonconforming  is  the  ratio  of  the  number  of 
nonconforming  items  in  a  population  to  the  total  number  of  items  in  a  population 
(1 8:148).  This  ratio  is  computed  for  each  sample  using  the  total  number  of  items  in  a 
sample.  This  chart  would  depict  the  fraction  of  components  that  are  down 
(nonconforming)  in  the  network.  A  variation  of  this  chart  also  exists  for  the  fraction 
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conforming,  also  called  a  p  chart,  and  for  the  number  of  nonconforming  items,  called  an 
np  chart  (18:148,162). 


2.4.8.2  C  chart.  This  is  also  called  a  control  chart  for  nonconformities. 
This  chart  depicts  the  number  of  nonconformities  observed  in  a  unit.  The  unit  is  a  sample 
of  constant  size  (usually  one  but  not  always)  (18:172). 

2.4.8.3  U  chart.  This  is  also  called  a  control  chart  for  nonconformities 
per  unit.  This  chart  depicts  the  average  number  of  nonconformities  per  unit  and  is  used 
when  the  unit  sample  size  is  not  constant  (18:176-80). 

2.4.9  Control  Charts  for  Variables.  When  the  quality  characteristics  of  a 
process  can  be  expressed  as  a  numerical  measurement,  control  charts  for  variables  can  be 
used.  The  characteristic  that  is  measured  is  called  a  variable.  It  is  standard  practice  in 
using  these  charts  to  plot  both  the  process  mean  and  variability  on  separate  charts.  This 
can  be  accomplished  using  x-bar  charts,  R  charts  and  S  charts  (18:201). 

2.4.9.1  X-bar  chart.  This  is  also  called  a  control  chart  for  means.  This 
chart  depicts  the  mean  (average  value)  of  the  measured  characteristic  in  a  sample  of 
observations  from  the  process  (18:203). 

2.4.9.2  R  chart.  This  is  also  called  a  control  chart  for  the  range.  This 
chart  depicts  the  range  of  values  (the  difference  between  the  largest  and  smallest 
observations)  of  the  measured  characteristic  in  a  sample  from  the  process.  This  chart  is 
used  to  monitor  process  variation  (1 8:203-5). 

2.4.9.3  S  chart.  This  is  also  called  a  control  chart  for  the  standard 
deviation.  This  chart  depicts  the  sample  standard  deviation  of  the  measured  characteristic 
in  a  sample  from  the  process.  This  chart  is  also  used  to  monitor  process  variation  and  is 
preferred  over  the  R  chart  when  either  the  sample  size  is  moderately  large  (greater  than 
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10  or  12)  or  the  sample  size  is  variable  (18:230).  A  variation  of  this  chart  also  exists  for 
sample  variance  (S^),  called  an  chart  (1 8:239). 

2.4.10  Runs  Rules.  A  disadvantage  of  all  of  the  previously  discussed  control 
charts  (also  known  as  Shewhart  control  charts)  is  that  they  ignore  any  information  given 
by  the  entire  sequence  of  points  on  the  chart.  They  only  evaluate  the  last  plotted  point  on 
the  chart.  (18:279)  This  can  be  “remedied”  by  applying  the  following  “sensitizing  rules” 
(or  runs  rules)  to  a  control  chart  to  detect  an  “out-of-control”  condition: 

1 .  One  or  more  points  outside  the  control  limits. 

2.  A  run  of  at  least  eight  points,  where  the  run  could  either  be  a  run  up  or  down,  a 
run  above  or  below  the  center  line,  or  a  run  above  or  below  the  median. 

3.  Two  of  three  consecutive  points  outside  the  2-sigma  warning  limits  but  still 
inside  the  control  limits. 

4.  Four  of  five  consecutive  points  beyond  the  1 -sigma  limits. 

5.  An  unusual  or  nonrandom  pattern  in  the  data. 

6.  One  or  more  points  near  a  warning  or  control  limit.  (18:11 7,279) 

These  runs  rules  are  applied  to  control  charts  to  better  detect  a  small  shift  in  the  process 
(on  the  order  of  about  1.5c  or  less).  Hence,  they  should  definitely  be  applied  if  the 
process  were  expected  to  ‘decay’  or  ‘wear  down’  slowly  over  time.  But  if  these  extra 
rules  and  warning  limits  are  seen  as  too  cumbersome,  two  other  control  charts  can  be 
used,  CUSUM  and  EWMA  charts  . 

2.4.11  CUSUM  Charts.  The  CUSUM  chart  is  a  cumulative-sum  control  chart. 
This  type  of  chart  can  be  used  for  many  different  sample  statistics  such  as  averages, 
ranges,  standard  deviations,  and  fractions  nonconforming,  and  is  particularly  effective 
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with  samples  of  size  one  (18:280,299).  This  chart  is  effective  in  detecting  small  process 
shifts  because  it  incorporates  information  from  several  samples  instead  of  just  one  like 
the  x-bar  chart.  This  is  accomplished  by  plotting  the  cumulative-sums  of  the  deviations 
of  the  sample  values  from  a  target  value.  For  example,  if  x-barj  is  the  average  of  the yth 
sample  and  po  is  the  target  value  for  the  process  mean,  the  cumulative-sum  is  calculated 
by: 

Si  =  -  Mo 

y=i 

where  Sj  is  the  cumulative-sum  up  to  and  including  sample  i  (18:279-80).  If  the  process 
remains  in  control  at  the  target  value  ixq,  Sj  should  fluctuate  aroimd  zero.  Hence,  an 
upward  or  downward  trend  indicated  on  the  chart  is  evidence  that  the  process  has  shifted. 
To  determine  whether  the  process  is  out-of-control,  a  V-mask  procedure  is  applied  to  the 
CUSUM  chart.  The  V-mask  procedure  is  similar  to  control  limits  on  the  previous 
Shewhart  control  charts.  Detailed  procedures  for  constructing  and  using  the  CUSUM 
chart  and  the  V-mask  along  with  a  tabular  form  of  the  CUSUM  are  contained  in 
Montgomery  (18:282-296)  and  Ryan  (22). 

2.4.12  EWMA  Charts.  The  EWMA  chart  is  an  Exponentially  Weighted 
Moving-Average  control  chart,  also  called  a  Geometric  Moving  Average  chart.  This 
chart  is  also  effective  in  detecting  small  process  shifts,  can  also  be  extended  for  other 
sample  statistics  besides  sample  averages,  and  is  also  effective  with  samples  of  size  one 
(18:299-300,306)  (22:122).  The  EWMA  is  a  weighted  average  of  all  previous  sample 
statistics.  Hence,  it  incorporates  information  from  several  samples  instead  of  just  one  like 
the  x-bar  chart.  An  out-of-control  condition  is  determined  from  control  limits  similarly  to 
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Shewhart  control  charts.  An  advantage  of  the  EWMA  over  the  CUSUM  is  the  EWMA’s 
ability  to  provide  a  forecast  of  where  the  process  statistic  will  be  at  the  next  time  period. 
One  downfall,  though,  of  both  the  EWMA  and  the  CUSUM  charts  is  that  they  do  not 
react  as  quickly  to  large  shifts  in  the  process  as  the  x-bar  chart  does.  Therefore,  to  cover 
both  large  and  small  shifts  in  a  proeess,  Montgomery  suggests  using  both  x-bar  and 
EWMA  procedures  together  as  either  separate  charts  or  even  on  the  same  chart  with  each 
one’s  respective  limits  plotted,  or  using  both  x-bar  and  CUSUM  procedures  together  on 
separate  charts.  (18:297,306)  Detailed  procedures  for  constructing  and  using  the  EWMA 
chart  are  contained  in  Montgomery.  (1 8:300-6) 


2.5  Prior  Applications 

Only  two  instances  of  an  attempt  to  apply  SPC  techniques  to  the  communication 
network  field  were  found.  The  first  applies  control  charts  not  to  a  communication 
network,  but  to  the  monitoring  of  software  development  for  GTE  Communications 
Systems  Corporation  (29:29.4.1).  The  second  more  relevant  source  is  a  Master’s  thesis 
by  Beadles  from  the  Naval  Post  Graduate  School  which  gives  an  overview  of  “basic  SPC 
tools  that  are  common  to  most  total  quality  organizations  [and] ...  highlights  more 
sophisticated  tools  used  in  the  communications  industry”  (3:2).  Also  presented  in  this 
thesis  is  a  case  study  of  applying  SPC  method  for  improving  a  communications  process. 
Although  Beadles’  thesis  provides  a  comprehensive  survey  of  SPC  tools  and  other 
statistical  methods  for  process  improvement,  little  attention  is  given  to  monitoring  a 
network  over  time  (with  the  exception  of  a  case  study  in  which  control  charts  are  used  to 
monitor  the  average  time  to  clear  an  AUTOVON  circuit)  (3:86-94).  Thus,  Beadles’ 
thesis  provides  a  useful  review  of  SPC  and  statistical  techniques  for  communications 
engineers  but  provides  little  particular  relevance  to  the  objectives  of  the  current  thesis. 


2-19 


2.6  Summary 

Knowing  the  observable  data  from  the  sponsor’s  commxmication  network,  the 
applicable  performance  measures  have  been  identified.  Of  the  many  SPC  techniques 
available,  control  charts  seem  well  suited  to  monitoring  these  identified  performance 
measures.  The  literature  reviewed  to  date  has  not  applied  this  SPC  technique  to 
monitoring  these  particular  performance  measures  of  a  communications  network.  Hence, 
the  applicability  of  control  charts  to  monitoring  these  performance  measures  will  be 
investigated  in  this  research. 
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3.  Methodology 


Now  that  applicable  performance  measures  have  been  identified,  methods  for 
using  control  charts  to  monitor  these  measures  are  discussed  in  detail  below.  Also,  as 
stated  in  Chapter  1,  data  from  the  actual  communication  network  as  well  as  the 
description  of  the  network’s  topology  was  not  available  from  the  sponsor.  Therefore,  in 
order  to  obtain  sample  data  for  charting  purposes,  a  computer  simulation  model  was 
created  to  generate  data  that  is  expected  to  be  representative  of  that  generated  by  the 
actual  communication  network.  This  simulation  model  is  also  discussed  below. 


3.1  Control  Charts 

There  are  many  different  types  of  control  charts  available  for  use.  This  section 
describes  the  usage  of  what  appear  to  be  the  most  applicable  types  for  the  performance 
measures  identified  in  the  previous  chapter.  These  types  are:  x-bar  and  R  charts,  XmR 
charts,  and  p  charts.  Each  is  discussed  below. 

3.1.1  X-bar  and  R  Charts. 

These  charts  are  used  for  data  that  are  numerical  measurements  of  the  system 
being  monitored(18:201).  These  measurements  are  then  organized  into  subgroups 
(samples)  of  size  greater  than  one  and  each  sample  is  summarized  by  an  average  (x-bar) 
and  a  range  (R)  (28:40).  The  sample  x-bars  are  plotted  against  time  on  the  x-bar  chart 
which  to  monitor  the  process  mean,  while  the  sample  ranges  are  plotted  on  the  R  chart 
which  to  monitor  the  process  variability  or  dispersion  (1 8:201).  If  the  mean  (p)  and 
standard  deviation  (a)  of  the  distribution  of  the  measurements  taken  on  the  process  when 
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it  is  in-control  are  known,  the  upper  and  lower  control  limits  (UCL  and  LCL  respectively) 
and  the  centerline  (CL)  for  the  x-bar  chart  are  calculated  as: 


<T 

CL=  ^+3-j= 

yin 

CL=  II 


LCL  = 


and  the  limits  for  the  R  chart  are  calculated  as: 


CL  =  D^a 
CL  = 

LCL  =  Di<t 


where 


Z)|  —  3(^3 

D2  =  Sd^ 


are  tabulated  constants  dependent  on  sample  size  given  in  Appendix  A  (18:221). 

If  the  in-control  process  mean  and  standard  deviation  are  not  known,  they  must  be 
estimated  from  the  sample  data.  This  sample  data  is  typically  taken  from  the  process 
when  it  is  assumed  to  be  in  control  and  then  the  mean  is  estimated  as  the  grand  average  of 
m  sample  averages  based  on  the  measurements: 


=  X,+X2+-+X„ 

JC  = - 

m 


where  m  =  number  of  samples. 
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The  standard  deviation  is  estimated  from  the  ranges  of  the  m  samples  via: 


R. 


^max  ^min 


R  = 


/?]  +  i?2  +■  ■ 
m 


where  d2  is  a  tabulated  constant  for  various  sample  sizes  also  given  in  Appendix  A.  In 
this  case,  the  control  limits  for  the  x-bar  chart  are  now  calculated  as: 


where 


CL  =  X  +  A2  R 


CL  =  x 

LCL  =  x-  A^R 


is  another  tabulated  constant  given  in  Appendix  A  (18:203-5).  The  control  limits  for  the 
R  chart  are  calculated  as: 


CL  =  R-¥M.— 
«2 

CL^R 

LCL  =  R-2d.— 

d. 


where 
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These  limits  can  be  redefined  as: 


CL  =  RD^ 
CL  =  R 
LCL  =  RD, 


where 


^2 

”2 


are  more  tabulated  constants  given  in  Appendix  A  (18:205-6). 


3.1.1. 1  Rational  Subgroups.  When  monitoring  a  process,  an  analyst 
often  has  some  flexibility  in  determining  when  mejisurements  should  be  taken  from  the 
process  and  how  these  measurements  should  be  grouped  over  time  into  samples  for 
plotting.  Ideally,  these  measurements  should  be  taken  so  as  to  minimize  the  variation 
(range)  within  the  samples  and  maximize  the  variability  between  samples  .  This  is 
necessary  since  the  control  chart  limits  are  calculated  using  this  within  sample  variation 
and  if  it  is  too  large,  the  control  limits  will  be  too  wide  and  are  not  able  to  detect  variation 
between  the  samples  (28:100).  In  general,  the  subgroups  should  be  selected  to  maximize 
the  chance  of  an  assignable  cause  occurring  between  samples  and  minimize  the  chance  of 
it  occurring  within  a  sample.  This  concept  is  called  rational  subgrouping  (18:1 13). 

There  are  two  approaches  for  selecting  rational  subgroups.  The  first  approach  is 
used  if  shifts  in  the  process  are  of  interest.  Here,  each  sample  (subgroup)  should  contain 
measurements  that  are  observed  as  close  together  as  possible.  This  should  ensure  that  the 
samples  are  independent  of  each  other  and  minimizes  the  chance  of  variability  within  the 
sample  since  the  units  are  close  together.  If  an  assignable  cause  occurs,  it  is  more  likely 
to  happen  between  samples  than  within  a  sample  (18:113). 
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The  second  approach  is  used  when  the  sample  is  to  be  representative  of  process 
performance  since  the  last  sample.  Subgroups  chosen  this  way  are  usually  spread  out  over 
the  entire  sampling  interval.  When  data  is  collected  in  this  way,  care  must  be  taken  in 
estimating  the  within  sample  variability  since  an  assignable  cause  could  occur  during  the 
sampling  interval.  A  shift  in  the  process  mean  may  then  cause  the  range  within  a  sample 
to  be  very  large.  This  could  cause  estimated  control  limits  to  be  too  large,  or  it  could 
cause  points  on  the  R  chart  to  plot  out-of-control  (indicating  a  shift  in  the  process 
variability)  when  the  shift  has  been  in  the  process  mean.  This  must  be  watched  when 
interpreting  control  charts  with  these  types  of  subgroups  (18:1 13-4). 

3. 1.1. 2  Autocorrelation  Between  Samples.  The  standard  application  of 
control  charts  and  runs  rules  assumes  that,  when  a  process  is  in  control,  samples  taken 
from  that  process  are  independent.  This  may  be  a  factor  in  the  sponsor’s  communication 
network  where  the  basic  reporting  interval  (300  seconds)  is  smaller  than  the  average  link 
down  time  (754  seconds).  Thus,  measures  such  as  the  number  of  links  down  observed  at 
a  reporting  time  are  probably  autocorrelated.  Hence,  care  must  be  taken  then  when 
determining  the  size  of  the  sampling  interval  when  plotting  these  types  of  performance 
measures  .  If  the  sampling  interval  is  large,  small  shifts  might  not  be  detected  and  any 
shifts  will  take  longer  to  detect.  If  the  interval  is  small,  small  shifts  may  be  detected  and 
shifts  can  be  detected  faster,  but  the  data  may  be  autocorrelated.  This  possible 
autocorrelation  is  not  accounted  for  by  conventional  control  charts  and  runs  rules, 
therefore  requiring  the  use  of  special  procedures  outlined  by  Montgomery  (18:341-51). 
This  issue  of  sampling  interval  and  autocorrelation  will  be  investigated  during  the  case 
study  where  the  actual  repair  rate  will  be  known. 

3.1. 1.3  X-bar  and  R  Chart  Performance  Measures.  X-bar  and  R 
charts  will  be  used  to  chart  the  following  performance  measures:  niraiber  of  links  down. 


3-5 


DwnLnk,  proportion  of  operating  paths,  p-path,  and  Availability  =  proportion  of  link 
uptime  (calculated  hourly  and  daily  for  each  link). 


3.1.2  XmR  Charts. 

These  individual  measurements  charts  are  designed  for  use  when  the  sample  size 
is  one  (18:241).  This  sample  size  can  occur  when  data  is  collected  periodically  or  when 
the  data  just  cannot  be  subgrouped  for  some  reason.  Here  each  data  value  is  uniquely 
identified  with  a  specific  period  of  time  and  the  frequency  of  collection  is  fixed.  If  the 
sample  size  is  increased  to  more  than  one,  this  could  create  non-homogeneous  samples 
that  represent  more  than  one  time  period  (28:217).  The  non-homogeneity  of  the  samples 
then  depend  on  the  homogeneity  between  time  periods.  If  the  time  periods  cannot  be 
grouped  together,  XmR  charts  are  required. 

The  control  limits  for  the  X  chart  are  computed  as: 


mR 


CZ  =  X  +  3- 


CL  =  x 

mR 

LCL  =  x-3— 


where  d2=1.128  for  a  moving  range  of  n=2.  (see  Appendix  A)  So,  the  control  limits  are 
simplified  to: 


CL  =  x+  2.66mR 
CL  =  x 

LCL  =  x-  2.66mR 

The  control  limits  for  the  mR  chart  are  computed  as: 
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where 


for  n=2.(l  8:242-3) 


CL  =  TnRD^ 

CL  =  mR 
LCL  =  mRD^  =  0 

mRf  = 

A=o 
A  =  3.267 

3.1.2.1  XmR  Chart  Performance  Measures.  The  XmR  charts  will  be 
used  with  the  measures:  Availability,  Time  Between  Failures  (TBF),  Time  to  Failure 
(TTF),  and  Time  to  Repair  (TTR)  (each  described  for  X-bar  charts  above),  Mean  Time 
Between  Failures  (MTBF),  Mean  Time  to  Failure  (MTTF),  and  Mean  Time  to 
Repair  (MTTR)  (calculated  over  all  past  failures  after  each  failure/repair  occurs),  and 
SSA  =  steady-state  availability  (calculated  for  each  link  individually  after  each  repair). 

3.1.3  P  Charts. 

This  chart  is  also  called  the  Fraction  Nonconforming  chart.  The  fraction 
nonconforming  per  sample  is  just: 

^  number _of_nonconforming  items _m  a  sample 

P  — - 

total  _  number  _  of  _  items  _  in_  a_  sample 

It  is  customary  to  work  with  the  fraction  nonconforming,  but  the  fraction  conforming  can 
be  used  just  as  easily  if  desired  (18:148).  The  p  chart  is  based  on  a  count  fi-om  a  binomial 
distribution.  This  means  that  the  each  item  in  a  sample  of  n  items  is  classified  as  either 
conforming  or  nonconforming  to  a  specification.  The  count  of  units  nonconforming  in  a 
sample,  D,  has  a  binomial  distribution  with  parameters  n,  sample  size,  and  p,  probability 
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of  a  unit  nonconforming  (or  fraction  nonconforming)  (18:148).  This  implies  that  the 
value  of  p  is  the  same  for  all  n  items  in  any  one  sample. 

This  binomial  probability  model  may  be  used  when,  according  to  Wheeler  and 
Chambers  (28:260),  four  conditions  are  satisfied: 

1 .  The  sample  size  for  count  D  consists  of  n  distinct  items. 

2.  Each  of  the  n  distinct  items  are  classified  as  either  conforming  or 
nonconforming. 

3.  The  count,  D,  is  the  count  of  the  number  of  items  in  the  sample 
nonconforming. 

4.  Coimts  are  independent  of  each  other.  The  preceding  item’s 
conformance/nonconformance  does  not  affect  the  following  item’s 
classification. 

If  these  four  conditions  are  not  satisfied,  the  p  chart  should  not  be  used.  An  XmR  chart 
could  be  used  instead. 

If  the  binomial  model  is  deemed  appropriate,  the  mean  (p)  and  standard  deviation 
(a)  of  a  binomial  count  are  defined  as  (28:261): 

<T  =  ylnp(l-p) 

where  n  =  sample  size  and  p  is  the  theoretical  fraction  nonconforming.  Alternately,  the 
mean  (p)  and  standard  deviation  (ct)  of  the  sample  fraction  nonconforming,  p-hat,  are 
defined  as  (18:148): 
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Il  =  p 


cr  = 


IpC^-p) 


The  control  limits  for  the  p  chart  when  p  is  unknown  are  calculated  as: 


^  b(i-p) 

CX  =  H“  3a 

V  n 


LCL  =  p-3. 


P(l-P) 

n 


where 


and  m=sample  number,  n=sample  size,  Di=number  nonconforming  in  sample  i.  If  the 
true  fraction  nonconforming,  p,  is  known,  it  is  used  in  the  above  limits  in  place  of  p-bar 
(18:148-50).  This  type  of  chart  is  useful  when  attributes  of  the  system  are  of  interest  that 
are  not  easily  represented  numerically  (such  as  a  communication  link’s  status  as  up  or 
down).  They  can  be  easily  monitored  on  a  p  chart. 
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3.1.3.1  P  Chart  Performance  Measures.  The  p  chart  will  be  used  with 
the  measures:  proportion  of  operating  links,  p-up  (calculated  at  each  ‘state’),  p-down 
(calculated  at  each  ‘state’),  p-path  (as  described  earlier),  and  p-link  (calculated  for  each 
link  individually  at  each  ‘state’). 

3.1.3.2  np  Chart.  This  is  a  chart  for  the  number  nonconforming.  This 
chart  monitors  the  binomial  count  of  units  nonconforming,  D,that  is  used  to  compute  the 
fraction  nonconforming  above.  Hence,  the  mean  (p.)  and  standard  deviation  (ct)  are  those 
defined  earlier  for  the  binomial  count  with  the  following  corresponding  control  limits 
when  p  is  unknown  (18:1 62): 


CL  =  np  +  3-yJnp(l  -  p) 
CL  =  np 

LCL  =  np-  3-Jnp(l  -  p) 


where  p-bar  is  defined  above.  As  for  the  p  chart,  the  theoretical  fraction  nonconforming, 
p,  is  used  in  the  above  limits  in  place  of  p-bar  if  it  is  known.  Any  of  the  binomial  count 
performance  measures  can  be  plotted  on  this  type  of  chart,  such  as  the  number  of  links 
down,  DwnLnk. 

3.1.4  Data  Distribution. 

There  is  a  common  belief  that  in  order  to  use  a  control  chart,  the  data  must  be 
normally  distributed.  This  belief  comes  from  the  use  of  the  tabulated  constants  used  in 
control  limit  computations.  The  values  for  these  constants  are  computed  assuming  a 
normal  distribution,  but  these  constants  will  not  change  appreciably  when  the  data  is  not 
normal  (28:65)  Wheeler  and  Chambers  conducted  their  own  study  which  included  six 
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different  distributions  including  such  ‘heavy-tailed’  distributions  as  the  Exponential  and 
Chi-Square  (28:66).  According  to  this  study,  “even  wide  departures  from  normality  will 
have  virtually  no  effect  upon  the  way  the  control  charts  function  in  identifying 
uncontrolled  variation”  (28:76), 

3.1.5  Trial  Control  Limits. 

When  the  standards  of  the  data’s  distribution  are  unknown,  they  must  be 
estimated  from  sample  data  and  used  to  compute  trial  control  limits.  It  is  common 
practice  to  collect  20  to  30  samples  to  calculate  the  trial  limits,  but  it  is  not  necessary  if 
only  limited  amounts  of  data  are  available  (28:45).  These  trial  control  limits  are  then 
applied  to  the  sample  data  to  determine  if  the  process  was  in  control  when  the  sample 
data  was  collected.  Montgomery  goes  on  to  say  that  if  some  of  the  sample  data  points 
plot  outside  of  the  control  limits,  they  should  be  examined  for  an  assignable  cause 
(18:150).  If  one  is  found,  the  point  is  discarded  from  the  trial  control  limits  calculation.  If 
no  assignable  cause  is  found,  the  point  can  either  be  discarded  as  having  been  drawn  from 
a  probability  distribution  characteristic  of  an  out-of-control  state,  or  it  can  be  retained  if 
the  limits  are  deemed  appropriate  for  current  control.  He  also  states  that  sometimes  many 
of  the  sample  data  points  plot  outside  of  the  control  limits.  In  this  case  it  is  more 
productive  to  look  for  a  pattern  among  these  points  rather  than  just  exclude  or  include 
them  all  blindly  (18:150).  But  Montgomery  states  in  a  later  section  of  his  book  that  if 
any  of  the  preliminary  samples  plot  outside  of  the  trial  control  limits,  the  samples  are 
simply  discarded  and  revised  control  limits  are  then  calculated  (18:241).  It  is  clear  then 
that  a  set  policy  does  not  exist  for  discarding  out  of  control  points  while  calculating  trial 
limits.  In  fact,  Wheeler  and  Chambers  state  that,  “Control  limits  ...  will  usually  detect  a 
lack  of  control  when  it  exists  even  though  the  out-of-control  points  were  used  in  the 
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computation”  (28:226).  Hence,  any  decision  regarding  the  exclusion  of  points  in  the 
computation  of  trial  control  limits  is  up  to  the  discretion  of  the  user. 


3.2  Data  Simulation 

3.2.1  Development  of  Simulation  Model. 

A  simulation  model  was  created  to  generate  simulated  data  representing  the 
sponsor’s  commimication  network.  It  was  developed  using  the  SLAM  II  simulation 
language  in  conjvmction  with  user  written  FORTRAN  inserts.  The  sole  purpose  for  the 
simulation  is  to  simulate  data  from  the  communication  network  that  is  currently 
unavailable.  The  main  activities  that  are  being  simulated  are  the  failure  and  repair  of  the 
links  contained  in  the  network.  The  rest  of  the  simulation  code  collects  and  calculates 
statistics  on  these  link  failures  and  repairs.  Path  enumeration  subroutines  are  adopted 
from  Van  Hove  to  enable  the  simulation  to  monitor  the  status  of  paths  as  well  as  those  of 
links.  These  subroutines  use  a  depth-first-search  method  on  a  tree  representation  of  the 
network  to  enumerate  all  the  paths  from  a  source  to  a  sink.  (27:34-6,82-6) 

Two  important  premises  to  modeling  the  network  are: 

(1)  Information  on  times-to-failure  and  times-to-repair  for  each  individual  link  is 
not  available  from  the  sponsor,  instead 

(2)  only  information  on  overall  link  performance  is  available  (in  the  form  of 
monthly  summaries  on  the  network  as  a  whole),  specifically: 

-  a  link  fails  once  every  169  seconds,  and 

-  the  average  down  time  (repair  time)  for  a  link  is  754  seconds. 

On  this  basis  it  is  assumed  that: 

(a)  overall  times  between  the  failures  of  any  two  links  are  exponentially 
distributed  with  mean  169  seconds. 
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(b)  each  link  is  equally  likely  to  fail  when  the  process  is  in-control  (i.e.,  when 
none  of  the  links  is  in  the  process  of  degrading),  and 

(c)  each  link  has  the  same  down  time  distribution  which  is  assumed  to  be 
exponential. 

Given  these  additional  assumptions,  the  simulation  fails  and  repairs  links  using 
the  following  two  steps: 

(1)  Link  failure  occurrence  times  are  determined  by  repeated  sampling  from  the 
overall  time-between-failure  distribution. 

(2)  At  the  time  of  a  failure,  the  link  having  the  failure  is  determined  by  choosing 
one  of  the  ‘up’ links  at  random. 

This  second  step  is  accomplished  by  sampling  from  a  uniform  distribution  between  0  and 
the  total  number  of  links  in  the  network  (0,  total  links).  The  outcome  corresponds  to  a 
link  number  in  the  network.  The  status  of  the  link  number  chosen  is  then  checked.  If  this 
link  is  already  failed,  a  new  draw  from  the  uniform  distribution  occurs.  This  continues  as 
necessary  until  a  link  number  is  chosen  with  a  status  of  ‘up’.  This  is  valid  as  long  as  the 
process  es  in-control  (i.e.,  all  links  are  equally  likely  to  fail).  The  structure  of  this  logic  is 
shown  in  the  flowcharts  in  Figure  3.1.  To  model  an  out-of-control  system  (i.e.,  a  link  is 
degrading)  requires  an  assumption  be  made  about  how  a  link  degrades.  For  example, 
does  a  degrading  link  fail  more  often,  have  longer  down  time  durations,  have  shorter 
times-to-failure,  etc..  If  a  link  is,  say,  five  times  as  likely  to  fail  as  any  other  link,  the 
uniform  distribution  for  choosing  a  particular  link  could  be  altered  to  sample  from  (0, 
total  links  +  4)  where  the  four  ‘extra’  links  will  be  assigned  to  the  degrading  link. 

The  simulation  model  is  run  on  a  486/33  IBM  compatible  personal  computer 
using  SLAMSYSTEM  Version  4.5  for  Windows.  This  is  a  commercial  version  of 
SLAMSYSTEM  that  requires  a  ‘Sentinel’  attachment  for  the  parallel  port  in  order  to 
provide  extended  storage  space  for  large  simulations.  This  hardware  attachment  is 
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obtainable  only  from  Pritsker  when  purchasing  the  software  (24).  Microsoft  FORTRAN 
Version  5.1  for  DOS  and  Windows  is  also  required  to  run  the  FORTRAN  inserts.  As 
written,  the  simulation  creates  approximately  12  MB  of  output  files  on  the  hard  drive  for 
one  month  of  simulation.  This  is  dependent  on  the  size  of  the  communication  network 
inputted  and  can  be  controlled  as  needed  by  simply  commenting  out  write  statements  in 
the  FORTRAN  code.  A  one  month  simulation  takes  approximately  20  minutes  to  run. 
All  SLAM  II  and  FORTRAN  code  are  contained  in  Appendix  E. 
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Figure  3.1  Continuous  Link  Failure  Routine 
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Event  4 


Event  5 
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3.2.2  Simulation  Validation. 

The  basic  simulation  model  can  be  validated  by  noting  that,  when  in  control,  the 
simulated  network  behaves  like  an  M/M/s  queueing  model  with  a  finite  calling 
population.  (10:163-5)  This  representation  can  be  realized  by  envisioning  the  links  as 
‘customers’  arriving  at  a  repair  facility  wherein  the  times  between  arrivals  are 
independently  and  identically  distributed  (iid)  according  to  an  exponential  distribution,  all 
down  times  are  regarded  as  service  times  and  are  assumed  to  be  (iid)  according  to  an 
exponential  distribution,  and  each  link  is  assumed  to  have  its  own  “repairman”  (server); 
This  last  assumption  can  be  made  since  ‘down  time’  or  ‘service  time’  starts  as  soon  as  a 
link  fails.  This  means  that  s,  the  number  of  servers,  equals  N,  the  number  of  links  in  the 
network  (finite  calling  population).  The  rate  diagram  for  this  model  is  shown  in  Figure 
3.2. 


NX  (N-1>.  (N-2X  X 


Figure  3.2  Rate  diagram  for  M/M/s  model  with  finite  calling  population  (N=s) 


The  average  arrival  rate  over  the  long  run,  (X_bar),  is  the  reciprocal  of  the  mean  of 
the  overall  time-between-failure  distribution  (derived  from  the  monthly  summaries  on  the 
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network  earlier  - 1/169  seconds).  Also  derived  from  the  monthly  summaries  on  the 
network  is  the  mean  service  rate  per  busy  server  or  repairman  (p),  which  is  the  reciprocal 
of  the  mean  of  the  exponential  down  time  distribution  (1/754  seconds).  The  mean  arrival 
rate  per  link  (X,)  can  be  derived  using  these  two  rates,  (X,_bar)  and  (p),  and  the  steady-state 
equations  for  the  M/M/s  queueing  model  (10:152,164-5).  This  derivation  is  shown  in 
Appendix  B. 

Initial  validation  of  the  model  uses  a  four-node,  five-link  network  shown  in 
Figure  3.3  where  X,_bar=l/169  seconds  and  p=l/120  seconds  (a  larger  value  of  p  is  used 
in  this  small  validation  network  to  preclude  links  from  failing  at  a  faster  rate  than  they  are 
repaired  hence,  rendering  an  almost  constant  all  links  down  condition).  Using  these  rates, 
the  steady-state  equations  in  Appendix  B  are  solved  for  X.  and  the  expected  number  of 
customers  in  the  queueing  system,  L.  The  value  obtained  for  X.  is  compared  to  each  link’s 
observed  end-of-simulation  MTTF  (Mean  Time  to  Failure)  via  1/  X,  =  MTTF,  and  L  is 
compared  to  the  average  number  of  links  down  (failed)  during  the  simulation.  The 
equations  and  computations  for  these  values  are  shown  in  Appendix  C. 


Figure  3.3  Validation  Network 
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The  simulation  was  run  for  10  months  in  order  to  ensure  steady  state  conditions  were 
reached.  The  SLAM  II  Summary  Report  also  shown  in  Appendix  C  contains  the  (steady- 
state)  values  of  each  link’s  MTTF  and  the  average  number  of  links  down  (failed)  during 
the  simulation  for  comparison  the  theoretical  values  of  1/k  and  L.  As  can  be  seen,  1/X 
and  L  agree  quite  well  (within  0.002  links  for  L  and  an  average  of  4.4  seconds  for  1/k) 
with  their  respective  simulation  values  to  show  that  the  simulation  model  is  valid.  An 
additional  cross  check  can  also  be  made  by  looking  at  the  means  and  standard  deviations 
of  the  end-of-simulation  sample  MTBF  (Mean  Time  Between  Failure),  sample  MTTF 
(Mean  Time  to  Failure),  and  sample  MTTR  (Mean  Time  to  Repair)  for  each  simulated 
link  on  the  SLAM  II  Summary  Report  in  Appendix  C.  The  term  sample  meaning  that  the 
sample  MTTF,  for  example,  is  estimated  from  the  simulation  data  (i.e.,  not  a  theoretical 
value).  The  means  and  standard  deviations  should  be  close  since  these  measures  are 
supposed  to  be  coming  from  exponential  distributions.  The  summary  report  shows  this 
cross  check  also  validates  the  simulation. 

3.2.3  Running  the  Simulation. 

Once  the  simulation  was  validated  using  the  4-node,  5-link  network,  the  case 
study  communication  network  was  inputted  and  nm  to  obtain  the  needed  data.  This 
network  has  41  nodes  and  77  links  and  is  shown  in  Figure  3.4  .  Table  3.1  shows  the  link 
number  assignments  of  this  network.  The  required  input  file  describing  network  topology 
for  the  path  enumeration  subroutines  is  shown  in  Appendix  F.  This  file  must  contain  the 
total  number  of  nodes,  total  number  of  links,  and  each  link’s  origin  and  destination  node 
pair  along  with  it’s  corresponding  link  number  assignment.  The  node  labeled  as  ‘  1’  is 
assumed  to  be  the  source  node  (s)  and  the  node  labeled  with  the  total  number  of  nodes 
(41  in  this  case  study)  is  assumed  to  be  the  sink  node  (t).  See  Appendix  F  for  proper 
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Figure  3.4  Case  Study  Network 


Table  3.1  Case  Study  Network  -  Link  Number  Assignments 


format  and  also  for  output  from  the  path  enumeration  subroutine  for  this  ease  study.  The 
sponsor  approved  this  network  as  an  acceptable  sample  network.  The  values  for  the 
overall  mean  time  between  link  failures  (average  arrival  rate  over  the  long  run) 
(A,_bar=l/169  seconds)  and  the  mean  down  time  (p.=l/754  seconds)  are  the  same  values 
obtained  earlier  from  the  data  provided  by  the  sponsor. 

Upon  running  the  simulation,  steady  state  is  approximated  by  the  end  of  the  first 
month.  Again,  the  steady  state  equations  in  Appendix  B  are  solved  for  X  and  L.  These 
theoretical  values  are  then  compared  to  their  respective  sample  values  from  the 
simulation  as  shown  for  the  validation  network.  This  comparison  again  validates  the 
simulation  and  is  shown  in  Appendix  D. 

Finally,  three  items  of  concern  were  raised  during  the  building  and  miming  of  the 
simulation  that  are  worthy  of  attention.  First,  there  is  a  distinct  difference  between  the 
terms  MTBF  (Mean  Time  Between  Failure)  and  MTTF  (Mean  Time  to  Failure).  MTBF 
is  the  time  from  the  failure  of  an  equipment  until  the  equipment  fails  again  (including  the 
repair  time),  while  MTTF  is  the  time  from  the  end  of  the  last  repair  until  the  next  failure. 
One  needs  to  be  sure  which  term  is  being  used  when  comparing  these  time  to  the  failure 
and  repair  rates.  Second,  multiple  mns  of  the  simulation  are  not  necessary  since  the  only 
purpose  of  the  simulation  is  to  obtain  example  data  of  a  hypothetieal  communieation 
network.  Third,  the  issue  of  an  initial  transient  period  in  the  data  is  not  a  problem  since 
the  probability  of  being  in  the  initial  state,  Pq,  is  on  the  same  order  as  the  probabilities  of 
being  in  any  of  the  other  10  most  probable  states  (see  Appendix  D).  Thus,  we  can  expect 
the  data  obtained  from  the  simulation  to  be  consistent  with  that  which  might  be  observed 
in  the  long  run  from  the  actual  communication  network  when  it  is  operating  in-control. 
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3.3  Summary 

The  methods  for  using  the  selected  control  charts  have  been  reviewed  along  with 
the  performance  measures  which  will  be  investigated  on  each  type  of  chart.  Also,  the 
simulation  model  used  to  obtain  sample  data  was  discussed.  The  results  of  plotting  each 
of  these  performance  measures  are  discussed  in  the  next  chapter. 
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4.  Case  Study  Results 


This  chapter  evaluates  the  identified  performance  measures  and  their  applicable 
control  charts  for  appropriateness  in  monitoring  a  communication  network.  This  is 
facilitated  through  a  case  study  that  is  developed  using  the  simulated  network  shown  in 
Figure  3.4.  Each  of  the  measures  for  the  three  monitoring  viewpoints  outlined  in  Chapter 
2  will  be  discussed.  The  procedures  in  Chapter  3  for  proper  control  chart  construction  are 
incorporated  into  EXCEL  spreadsheets  and  macros  that  expand  on  the  work  of  Horton 
(1 1  These  EXCEL  spreadsheets  and  macros  are  used  extensively  to  complete  the 
case  study  and  automate  the  various  control  procedures  for  the  sponsor.  Once  appropriate 
charts  have  been  evaluated,  recommendations  on  establishing  Level  of  Service  (LOS) 
Agreement  specifications  will  be  discussed. 


4.1  Overall  Network  Performance  Measures 

Three  measures  were  identified  as  potential  indicators  of  overall  network 
performance.  These  are  number  of  links  down  (DwnLnk),  proportion  of  operating  links 
(p-up),  and  proportion  of  links  down  (p-down).  First,  the  expected  values  of  these 
measures  for  an  in-control  network  are  derived  for  use  in  developing  control  charts  based 
on  standards,  followed  by  a  demonstration  of  the  use  of  these  charts,  especially  for 
monitoring  network  degradation. 

4.1.1  Theoretical  Considerations. 

As  stated  earlier  in  Chapter  3,  the  simulation  model  can  be  viewed  as  a 
representation  of  an  M/M/s  queueing  model  with  a  finite  calling  population.  The  steady 
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state  equations  for  this  model  are  presented  in  Appendix  B.  In  these  equations,  L  =  4.46 
is  the  long  run  expected  number  of  links  down  (failed).  This  is  the  theoretical  and 
unconditional  mean  for  the  number  of  links  down,  DwnLnk,  in  the  long  run.  Since  there 
are  77  links  in  the  network  (sample  size),  L  /  77  is  the  expected  proportion  of  links  down 
in  the  long  run  which  equivalently  represents  the  probability  that  that  a  specific  link  will 
be  down  at  some  arbitrary  point  in  time.  Given  this  framework,  the  number  of  links  that 
will  be  down  whenever  the  state  of  the  network  is  observed  can  be  modeled  as  a 
binomial  random  variable  with  parameters  n  =  77  and  p  =  L  /  77  =  4.46  /  77  (see 
conditions  for  using  the  binomial  probability  model  in  Section  3.1.3).  The  parameters  n 
and  p  can  then  be  used  to  compute  the  theoretical  means  of  p-down  =  DwnLnk  /  77  and 
p-up  =  1  -  p-down  which,  in  turn,  results  in  the  following  control  limits  for  the  three 
overall  network  performance  measures. 


np  chart  for  DwnLnk: 


UCL  =  np  +  3.^np(l  -  p)  =  4.46  +  3  J4.46  1 


4.46 
77  ) 


=  10.6094 


CL  =  np  =  77^ 


4.46 
V  77 


=  4.46 


LCL  =  np-  3-^np(l  -  p)  =  4.46  -  3^4.46  1 


4.46 
77  J 


=  -1.6894  =  0 


p  chart  for  p-down: 

UCL  =  p  +  3 


IpO-p)  I 

f  4.46^ 

1  77  J 

77  J 

1  n  -I 

1  77  J 

11 

=  0.1378 


4.46 

CL  =  p  =  —— =  0.0579 
77 


LCL  =  p-3. 


Ir  4.46' 

fl 

IpG-p)  I 

31 

1  77  . 

V  77  J 

J  n 

1  77  J  ^ V 

77 

=  -0.0219  =  0 
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and  p  chart  for  p-up: 


UCL  =  (l  -  p)+  3 j— -  =  1  - 


4.46 

CL  =  l-p=l-  — =  0.9421 


>'  4.46^ 


V  77  ; 


77 


LCL  =  (l-/j)-3i 


'  n  \11  J 


=  1.022  =  1 


=  0.8622 


Hence,  the  standards  for  all  three  of  these  performance  measures’  distributions  are  known 
(for  the  case  study  network  model)  and  do  not  need  to  be  estimated  from  the  sample  data 
in  order  to  compute  the  control  limits.  Notice  that  the  value  for  the  count  cannot  be  less 
than  0  and  the  proportions  must  be  between  (or  equal  to)  0  and  1  (and  so  must  their 
corresponding  control  limits  for  them  to  be  meaningful). 

4.1.2  Autocorrelation  of  Data  Points. 

As  mentioned  in  Section  3. 1.1. 2,  autocorrelation  may  exist  between  consecutive 
observations  of  the  number  of  links  down  if  the  time  between  observations  is  short  and 
especially  if  the  data  collection  rate  is  faster  than  the  repair  rate  of  the  links.  The  repair 
rate  of  the  links  for  the  case  study  network  is  1/754  seconds;  much  slower  than  the 
collection  rate  of  1/300  seconds.  Therefore,  possible  autocorrelation  must  be 
investigated.  The  number  of  links  down  is  shown  in  time  series  plots  using  the  collection 
rates  of  1/300  seconds,  1/15  minutes,  1/30  minutes,  and  1/hr  (chosen  to  compare  a  wide 
range  of  collection  rates  and  for  their  even  intervals  to  facilitate  ease  of  data  collection)  in 
Figures  4.1  -  4.4.  The  first  96  observations  of  a  one  week  period  of  data  is  charted  for 
each  collection  rate.  As  the  data  collection  interval  gets  larger,  the  evidence  of  patterns 
(autocorrelation)  becomes  less  prominent. 
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Figure  4.2  Time  Series  Plot  of  1/15  min  DwnLnk  Data 
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Sample  Number 


Figure  4.4  Time  Series  Plot  of  l/hr  DwnLnk  Data 
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One  week  of  data  for  the  Measure  DwnLnk  was  then  tested  for  autocorrelation  at 
lag  1  using  the  software  package  STATISTIX  Version  4.0  (26).  This  software  uses  the 
computations  for  the  estimate  of  the  autocorrelation  function  at  lag  k  fi’om  Box  and 
Jenkins  shown  below  (5:26-32): 


where  the  data  observations  are  Zj,  Z2, . . .  Zn  and: 


To  test  whether  an  observed  r^t  value  is  significantly  different  from  zero  (i.e.,  to  test 
whether  the  theoretical  autocorrelation  at  lag  k  is  zero,  =  0),  Makridakis  outlines  a 
standard  error  formula  for  random  data.  If  the  data  is  truly  random  (not  autocorrelated), 
95%  of  the  sample-based  autocorrelation  coefficients  should  lie  between  the  limits 
(16:367-9): 

-  1.96(l/Vi^)<  r*  <  1.96(l/Vi^) 


Hence,  this  can  be  used  as  a  rough  guideline  for  determining  if  the  autocorrelation  is 
present  or  not.  The  autocorrelation  values  for  all  three  performance  measures  were 
identical  since  they  all  are  functions  of  the  number  of  links  down.  These  values  are 
shown  in  Table  4.1  with  their  corresponding  limits. 
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Table  4.1  Autocorrelation  for  Overall  Performance  Measures  (1  week  of  data) 


Collection  Rate 

ri 

Limits 

Sample  Size 

Autocorrelation 

1/300  seconds 

0.674 

-0.044^i<0.044 

2016 

significant 

1/15  minutes 

0.301 

-0.076ai<0.076 

672 

significant 

1/30  minutes 

0.047 

-0.107^i<0.107 

336 

no  significant 

1/hour 

-0.044 

-0.151^i<0.151 

168 

no  significant 

From  this  autocorrelation  information,  collecting  data  either  once  every  30  minutes  or 
once  every  hour  should  provide  uncorrelated  data  points.  One  of  these  rates  should  be 
used  if  conventional  control  charts  are  to  be  used.  If  a  faster  collection  rate  is  desired  in 
order  to  detect  process  shifts  sooner,  special  procedures  are  required  as  outlined  in 
Montgomery  (18:341-51).  For  this  case  study,  the  rate  of  1/hr  will  be  used  for 
consistency. 


4.1.3  Demonstration  of  Procedures 

Each  performance  measure  has  an  appropriate  control  chart(s)  for  monitoring  it 
depending  on  the  type  of  data  each  measure  represents.  The  charts  for  each  performance 
measure  were  identified  in  Chapter  3  and  will  be  demonstrated  here.  Each  chart  in  this 
section  is  shown  with  its  control  limits  (UCL,  CL,  and  LCL)  and  1 -sigma  and  2-sigma 
warning  limits.  All  limits  in  this  section  were  computed  in  Section  4.1.1. 

4.1.3.1  Down  Links.  The  measure  DwnLnk  has  two  possible  control 
charting  techniques;  np  chart  and  x-bar  and  R  charts.  The  np  chart  is  shown  in  Figure 
4.5.  Applying  the  runs  rules  from  Section  2.5.10  show  that  only  one  point  on  the  np 
chart  (Sample  60)  is  plotting  out-of-control.  Since  it  is  known  that  there  is  not  an 
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assignable  cause  for  Sample  60  (there  were  no  assignable  causes  built  into  the 
simulation),  the  process  is  concluded  to  be  in-control.  The  samples  were  generated  from 
the  simulation  which  was  designed  to  be  in-control,  hence  the  single  out-of-control  point 
is  a  chance  occurrence  due  to  the  draw  from  the  exponential  distributions  included  in  the 
simulation. 


DwnLnk  1hr 


Sample  Number 


»  Count  Value 
UCL(np) 

.  .  .  .CL(np) 
LCL(np) 

_ UJ  SIGMA 

_ LJ  SIGMA 

. U_2SIGMA 

. L_2SIGMA 


Figure  4.5  up  Chart  for  DwnLnk  (1/hr  collection  rate) 


In  actuality  DwnLnk,  the  number  of  down  links,  is  a  time-persistent  variable  in 
the  network  that  changes  value  whenever  a  link  fails  or  is  repaired  (not  necessarily  at 
regular  intervals).  If  this  time-persistent  data  (instead  of  the  ‘snapshot’  data  taken  at 
regular  intervals)  were  plotted  it  would  look  similar  to  Figure  4.6. 
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Figure  4.6  Behavior  of  Time-persistent  Variable  DwnLnk 


This  average  of  this  data  over  a  one  day  interval  can  be  approximated  by  averaging  the 
numbers  observed  at  each  hourly  reporting  time  (the  1/hr  DwnLnk  data  used  for  the  np 
chart  earlier).  This  new  statistic  for  DwnLnk  is  then  a  reasonable  surrogate  for  the 
average  of  the  original  time-persistent  variable.  Hence  it  is  reasonable  to  use  the 
‘snapshot’  data  in  place  of  the  time-persistent  data  if  collection  of  the  time-persistent  data 
is,  for  example,  too  costly.  The  mean  and  variance  (or  range)  of  the  average  number  of 
links  down  per  day  can  then  be  estimated  using  the  equations  in  Section  3.1.1  and 
subsequently  used  to  compute  the  following  control  limits  (1  month  of  hourly 
observations  with  a  sample  size  of  n  =  24  were  used  for  estimation). 

x-bar  chart  for  DwnLnk(A2  =  0.157  for  n  =  24): 

UCL  =  X  +  ^2^  =  4.4139  +  0.1 57(7.8)  =  5.6402 
CL  =  x  =  4.4139 

LCL  =  x- A^R=  A AU9- 0.157(7.8)  =  3.1876 
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R  chart  for  DwnLnk  (d2  =  3.895  and  d3  =  0.712  for  n  =  24): 

_  "p  Tvs') 

UCL  =  i?  +  Sc/,  —  =  7.8  +  3(0.7 1 21  =  12.0744 

^2  \3»o95/ 

CL  =  R  =  7.8 

R  r  7  8  ^ 

LCL  =  ?-3rf,-  =  7ii-X0.712(— J=3.5178 

The  x-bar  and  R  charts  are  shown  in  Figure  4.7  and  Figure  4.8.  The  hourly  data  is 
subgrouped  into  daily  samples  of  size  n  =  24.  Applying  the  runs  rules  to  these  charts 
show  that  two  points  (Sample/Day  10  and  21)  on  the  x-bar  chart  and  a  run  of  2  on  the  R 
chart  (Days  17  and  18)  are  plotting  out-of-control.  Again,  the  samples  were  generated 
from  the  in-control  simulation,  so  there  is  no  assignable  cause  for  the  out-of-control 
points.  Investigating  Day  10  for  demonstration  purposes  shows  that  this  sample’s  24  data 
points  include  4  values  of  6, 4  values  of  7, 3  values  of  8,  and  2  values  of  9.  With  a  UCL 
value  of  5.6402,  half  of  the  data  points  were  above  this  limit.  Examining  the  individual 
data  points  in  each  sample  is  helpful  in  detecting  if  the  out-of-control  point  was  caused  by 
one  or  two  unusual  data  points  or  if  an  actual  shift  in  the  mean  did  occur  (18:212).  Here, 
since  some  of  Day  lO’s  data  points  are  unusually  high,  and  since  a  trend  toward  high 
values  of  DwnLnk  is  not  evident  after  Day  10,  this  out-of-control  indication  could  be 
deemed  the  result  of  a  random  occurrence  if  the  search  for  an  assignable  cause  of  the  high 
values  proves  fruitless.  This  demonstrates  how  the  x-bar  chart  detects  a  possible 
out-of-control  condition  and  thus  prompts  a  search  for  an  assignable  cause.  As  an 
example,  the  network  monitor  could  search  for  an  assignable  cause  for  the  high  number 
of  links  down  by  investigating  the  ‘conditions’  of  the  network  during  Day  10.  This  could 
be  accomplished  by  looldng  at  the  network  controller’s  log  for  any  unusual  occurrences 
on  Day  10.  For  instance,  an  electrical  storm  may  have  disrupted  the  network,  or  perhaps 
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there  were  unusually  high  traffic  loads  that  caused  links  to  fail  more  often.  An 
investigation  of  this  type  could  reveal  obvious  assignable  causes  like  these,  or  could 
reveal  a  cause  that  would  have  been  noticed  because  of  the  control  chart’s  prompting. 

Both  the  np  chart  and  the  x-bar  and  R  chart  techniques  seem  viable  for  monitoring 
the  performance  measure  DwnLnk.  The  choice  between  them  should  be  based  on  the 
desired  rate  of  detection  for  out-of-control  points.  If  a  shift  in  the  average  number  of 
links  down  needs  to  be  detected  in  a  matter  of  hours  rather  than  days,  the  np  chart  should 
be  used.  Also,  since  the  observations  are  grouped  on  the  x-bar  and  R  charts,  they  now 
represent  a  longer  time  interval  and  the  samples  are  now  comparing,  for  example,  daily 
values  instead  of  hourly  values.  Consequently,  if  hourly  comparisons  are  desired  when 
the  data  collection  rate  is  1/hr,  the  np  chart  is  the  logical  choice.  However,  if  daily 
comparisons  are  desired  when  the  data  collection  rate  is  1/hr,  the  x-bar  and  R  charts 
should  be  chosen.  One  additional  consideration  before  choosing  is  that  x-bar  and  R 
charts  aggregate  data  which  could  smooth  out  shifts  and  cause  slower  detection  (or  no 
detection  at  all  for  small  shifts),  but  it  also  allows  direct  monitoring  of  the  average 
number  of  links  down  in  a  day  if  that  is  a  concern  of  the  network.  Alternately,  the  np 
chart  plots  the  individual  data  points  so  that  each  measurement  can  be  monitored  directly 
and  aggregation  is  not  a  concern. 

4.1.3.2  Proportion  of  Operating  Links.  The  measure  p-up  has  one  possible 
control  charting  technique;  the  p  chart  (although  the  technique  above  for  averaging  the 
counts  of  DwnLnk  on  an  x-bar  chart  can  also  be  applied  to  p-up  [and  also  p-down  in  the 
next  section],  it  will  not  be  covered  again  for  the  sake  of  redundancy).  The  p  chart  is 
shown  in  Figure  4.9  (Figure  4.9  is  shown  with  Figure  4.10  so  that  it  can  be  more  easily 
compared  to  p-down).  Applying  the  runs  rules  show  that  only  one  point  (Sample  60)  is 
plotting  out-of-control.  This  is  the  same  out-of-control  point  identified  on  the  np  chart 
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Sample  p 


p-up  1hr 


for  the  measure  DwnLnk.  This  is  understandable  since  the  value  for  DwnLnk  is  used  in 
the  computation  of  p-up.  The  high  number  of  links  down  has  lowered  the  overall  link 
reliability  resulting  in  an  out-of-control  indication.  This  chart  is  perfectly  suited  to 
monitor  the  performance  measure  p-up  since  this  measure  is  based  on  the  binomial  count, 
DwnLnk,  as  demonstrated  in  Sections  3.1.3  and  4.1.1.  Each  individual  data  point  is 
plotted,  so  there  is  no  concern  over  aggregation  of  data. 

4.1.3.3  Proportion  of  Links  Down.  The  measure  p-down  also  has  one 
possible  control  charting  technique;  the  p  chart.  This  chart  is  shown  in  Figure  4.10. 

Once  again  applying  the  runs  rules  show  that  only  one  point  (Sample  60)  is  plotting  out- 
of-control.  This  is  the  same  out-of-control  point  identified  on  both  the  np  chart  for 
DwnLnk  and  the  p  chart  for  p-up.  This  only  makes  sense  since  p-down  is  the  antithesis 
of  p-up.  The  high  number  of  links  down  has  raised  the  overall  proportion  of  links  down 
resulting  in  an  out-of-control  indication.  This  chart  is  also  perfectly  suited  to  monitor  the 
performance  measure  p-down  since  this  measure  is  also  based  on  the  binomial  count, 
DvmLnk,  as  demonstrated  in  Sections  3.1.3  and  4.1.1.  Here  too,  each  individual  data 
point  is  plotted,  so  again,  there  is  no  concern  over  aggregation  of  data.  Comparing  the  p- 
down  and  DwnLnk  charts  (Figure  4.10  and  Figure  4.5),  it  is  easily  seen  that  the  p  chart  is 
just  a  ‘rescaling’  of  the  np  chart.  The  plotted  points  in  relation  to  their  respective  control 
limits  are  identical.  Also  comparing  the  p-down  and  p-up  charts  (Figure  4.10  and  Figure 
4.9),  it  is  easily  seen  that  these  charts  are  a  ‘mirror  image’  of  each  other. 

Therefore,  since  all  three  measures,  DwnLnk,  p-up,  and  p-down  are  so  closely 
related,  only  one  of  them  probably  needs  to  be  plotted  by  the  sponsor.  A  choice  between 
the  measures  will  depend  on  what  makes  the  most  sense  to  the  network  controller/ 
monitor;  a  proportion  of  links  up  (p-up),  a  proportion  of  links  down  (p-down),  or  a  simple 
count  of  the  number  of  links  down,  DvraLnk. 
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4.1.4  Degradation  Monitoring. 

As  stated  in  Section  2.3,  degradation  will  be  monitored  by  charting  these  and 
other  performance  measures.  The  three  performance  measures  identified  as  indicators  of 
overall  network  performance  (DwnLnk,  p-up,  and  p-down)  can  all  be  used  as  indirect 
measures  of  network  degradation  through  the  runs  rules.  The  runs  rules  are  designed  to 
evaluate  the  entire  sequence  of  points  on  a  control  chart,  thereby  enabling  the  chart  to 
detect  small  shifts  and  decaying  or  degrading  conditions.  Where  a  single  out-of-control 
point  on  a  control  chart  should  be  investigated  for  an  assignable  cause,  so  too  should  a 
run  of  points  satisfying  the  runs  rules.  Assuming  that  degradation  of  the  network 
manifests  itself  as  more  links  failing  over  time,  a  slow  increase  of  DwnLnk  and  p-down 
or  a  slow  decrease  of  p-up  is  an  indication  of  degradation  in  the  network.  Also,  an  abrupt 
shifts  can  occur  indicating  an  abrupt  degradation  rather  than  a  slow  decaying  degradation. 
These  patterns  should  be  watched  for  on  the  control  charts  to  monitor  network 
degradation. 

4.1.4.1  Degradation  Case  Study.  After  the  data  was  obtained  for  the  in¬ 
control  case  study  network,  the  simulation  was  reprogrammed  to  include  a  degradation. 
This  degradation  consisted  of  link  number  13,  in  the  same  network  (Figure  3.4  and  Table 
3.1)  failing  five  times  more  often  than  each  of  the  other  links  in  the  network.  This 
degradation  occurred  abruptly  and  not  slowly  over  time.  This  degradation  case  study  will 
be  followed  through  each  of  the  three  monitoring  viewpoints  using  one  of  the  measures 
fi-om  each  viewpoint  in  an  attempt  to  detect  the  degradation. 

In  this  first  viewpoint  of  the  overall  network  measures,  the  degraded  data  is 
plotted  for  hourly  p-down  with  the  resulting  p  chart  shown  in  Figure  4.1 1  .  An  important 
note  here  is  that  the  control  limits  have  already  been  computed  from  either  known 
theoretical  standards  or  from  sample  data  from  when  the  process  is  in-control,  and  the 
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Figure  4.11  p  Chart  for  p-down  (Degraded  Link  13) 


new  data  is  plotted  on  the  already  established  chart.  The  degraded  condition  is  that  link 
number  13  is  failing  five  times  as  often  as  any  other  link  with  the  overall  time-between 
failures  for  all  links  adjusted  so  as  to  keep  the  other  links’  individual  time-to-failure  equal 
among  all  links  except  link  number  13.  This  degradation  occurred  in  Sample  25  and  is 
not  detected  by  the  chart.  Hence,  since  p-down  is  measuring  the  proportion  of  down  links 
at  any  given  time,  no  change  should  be  seen  from  the  degradation.  Link  13  is  failing 
more  often,  but  the  other  links  are  failing  less  often  to  keep  the  overall  time-between- 
failures  the  same.  Therefore  the  proportion  p-down  will  not  indicate  any  shift.  This  is 
supported  by  the  p  chart  in  Figure  4.1 1.  This  particular  degradation  cannot  be  monitored 
using  p-down  since  it,  along  with  Dwn  Lnk  and  p-up,  are  based  on  counts  of  the  total 
number  of  links  down.  If  this  count  is  not  affected  by  the  particular  degradation,  then 
these  measures  will  not  detect  the  degradation.  This  reinforces  the  need  to  use  more  than 
one  performance  measure  to  completely  monitor  the  network. 
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4.2  (s-t)  Performance  Measures 

One  measure  was  identified  as  a  potential  indicator  of  network  performance  for  a 
customer’s  (s-t)  pair.  This  measure  is  the  proportion  of  operating  paths  (p-path).  For  this 
case  study,  the  source  (s)  is  node  1  and  the  sink  (t)  is  node  41  as  shown  in  Figure  3.4.  the 
choices  for  s  and  t  were  made  for  simplicity  sake,  but  any  two  nodes  can  be  chosen  as 
long  as  they  are  renumbered  as  s  =  1  and  t  =  the  highest  numbered  node  in  the  network; 
this  is  required  by  the  path  enumeration  subroutine)  The  path  emuneration  subroutine  in 
the  simulation  identified  198  paths  from  s-t.  These  paths  are  shown  in  Appendix  F. 

First,  theoretical  considerations  of  these  of  this  measure  will  be  discussed,  followed  by  a 
demonstration  and  then  degradation  monitoring  using  this  measure. 

4.2.1  Theoretical  Considerations. 

The  measure  p-path  appears  to  be  a  proportion  based  on  a  binomial  count  of  the 
number  of  operational  paths  with  parameters  n  and  p.  If  this  is  true,  then  the  parameter 
n  =  198  (total  paths)  is  known,  but  the  parameter  p  is  unknown  since  the  ‘theoretical 
average  number  of  non-operational  paths’  is  unknown.  Consequently,  the  standards  for 
this  performance  measure’s  binomial  distribution  are  unknown  (for  the  case  study 
network  model)  and  need  to  be  estimated  from  the  sample  data.  The  parameter  p  is 
estimated  from  the  sample  data  with  p-bar: 


m  m 

Ea  Ea 

p=— — =— — 
mn  m 


where  m  =  number  of  samples  observed,  n  =  total  nximber  of  paths,  Dj  is  the  number  of 
non-operating  paths  in  sample  i  (i  =  1,2, ... ,  m),  and  p-hatj  is  the  proportion  of  non¬ 
operating  paths  in  sample  i  (1 8: 149).  Note  that  the  measure  used  here,  p-path,  is  the 
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proportion  of  operating  paths.  The  value  (1  -  p-bar)  is  then  used  to  calculate  the  estimated 
mean  and  standard  deviation  for  use  in  calculating  the  appropriate  control  limits. 


4.2.2  Autocorrelation  of  Data  Points. 

The  measure  p-path  is  collected  at  every  ‘state’,  so  the  autocorrelation  of  the  data 
points  is  investigated  for  this  new  measure.  Once  again,  collection  rates  of  1/300 
seconds,  1/15  minutes,  1/30  minutes,  and  1/hr  were  tested  for  autocorrelation  at  lag  1 
using  the  software  package  STATISTIX  Version  4.0  (26)  and  then  tested  using  the  95% 
standard  error  limits.  The  results  are  shown  in  Table  4.2 


Table  4.2  Autocorrelation  for  p-path  (1  week  of  data) 


Collection  Rate 

fi 

Limits 

Sample  Size 

Autocorrelation 

1/300  seconds 

0.639 

-0.044^i<0.044 

2016 

significant 

1/15  minutes 

0.246 

-0.076^i<0.076 

672 

significant 

1/30  minutes 

0.085 

-0.107^i<0.107 

336 

no  significant 

1/hour 

0.147 

-0.151^i<0.151 

168 

no  significant 

From  this  autocorrelation  information,  collecting  data  at  the  rates  of  once  every  30 
minutes,  or  once  every  hour  should  provide  imcorrelated  data  points  for  conventional 
control  chart  usage.  As  was  stated  earlier,  the  rate  of  1/hr  will  be  used  for  consistency  in 
this  case  study. 
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4.2.3  Demonstration  of  Procedures. 

The  charts  identified  in  Chapter  3  for  monitoring  p-path  will  be  demonstrated 
here.  Each  chart  in  this  section  is  shown  with  its  control  limits  (UCL,  CL,  and  LCL)  and 
1 -sigma  and  2-sigma  warning  limits. 

4.2.3.1  Proportion  of  Operating  Paths.  As  stated  in  Section  4.2.1,  the 
measure  p-path  appears  to  be  a  proportion  based  on  a  binomial  count  of  the  number  of 
operational  paths  with  parameters  n  and  p.  Even  so,  the  measure  has  three  possible 
control  charting  techniques;  p  chart,  XmR  charts  and  x-bar  and  R  charts.  For  the  p  chart, 
p-bar  (and  1-  p-bar )  and  the  corresponding  control  limits  are  calculated  below.  (1  month 
of  hourly  observations  were  used  for  estimation): 


E(«-a)  E(i-a) 

-p  =  — - =  — - =  0.6976 


mn 


m 


p  chart  for  p-path: 

UCL  =  (1  -  .p)+  =  0.6976  +  3^(Q-^976X0.3^^  =  0.7955 

CL  =  l-p  =  0.6976 

LCL  =  (1  -  p)-  -  3  =  0.5997 

The  p  chart  is  shown  in  Figure  4.12.  Here  the  runs  rules  do  not  even  have  to  be  applied  to 
this  chart  to  see  that  many  points  are  plotting  out-of-control.  Since  this  data  was 
generated  fi'om  an  in-control  network,  this  chart  is  either  giving  many  out-of-control 
‘false  alarms’  or  some  other  cause  is  responsible.  If  this  were  not  known  to  be  data  from 
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an  in-control  process,  this  chart  would  clearly  indicate  an  out-of-control  network.  It  is 
possible  that  there  is  a  very  high  amount  of  inherent  variability  in  the  network  with  regard 
to  this  performance  measure.  But  then  the  question  is  asked,  if  there  is  so  much  inherent 
variability,  why  are  the  control  limits  so  narrow?  Remember,  the  control  limits  on  this 
chart  were  calculated  using  standard  estimated  from  the  very  data  that  is  now  plotted  on 
them.  A  high  amount  of  variability  in  the  data  will  cause  the  control  limits  computed 
from  this  data  to  be  wide.  The  XmR  chart  will  help  explain  this  problem  and  is  therefore 
investigated  next. 


For  the  XmR  charts,  the  control  limits  also  need  to  be  calculated  with  estimated 
standards.  This  will  be  done  assuming  that  the  data  is  not  from  a  binomial  distribution 
for  comparison  to  the  p  chart  done  earlier.  One  month  of  hourly  observations  were  used 
for  estimation  of  the  standards  for  the  control  limits. 


estimated  standards  for  p-path: 


X  = 


X^  +X2+--  •+X„ 


=  0.6914 


m 


m-\ 


mR  =  ^mRi  =  0.2035 


<=i 


mRj  =  X,  -  x,_, 


where  x-bar  is  the  average  of  the  individual  proportions  observed  every  hour,  Xj,  and 
m  =  168.  The  control  limits  for  the  X  chart  are: 

UCL  =  3c  +  2.66^  =  0.6914  +  2.66(0.2035)  =  1.2327  =  1 

CL  =  3c  =  0.6914 

LCL  =  3c  -  2.66^  =  0.6914  -  2.66(0.2035)  =  0.1502 


and  the  control  limits  for  the  mR  chart  are: 


where 


UCL  =  mRD,  =  0.2035(3.267)  =  0.6648 
CL  =  ^  =  0.2035 
LCL  =  mRD^  =  0  always 


£>3=0 

£>4  =  3.267 
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The  XmR  charts  for  p-path  are  shown  in  Figure  4. 13  and  Figure  4.14.  Notice  on 
the  X  chart  how  wide  the  control  limits  are,  this  seems  to  support  the  previous  theory  of  a 
highly  variable  network  in  relation  to  p-path.  Three  points  (Samples  13, 40,  and  63)  on 
the  X  chart  and  three  points  (Samples  14  and  41  along  with  Sample  13  as  part  of  a  ‘2  of 
3’  run)  on  the  mR  chart  are  plotting  out-of-control.  These  ‘spikes’  in  the  mR  chart 
correspond  to  the  out-of-control  points  on  the  X  chart  as  they  should.  Closely  examining 
Samples  13, 40,  and  63  on  the  X  chart  identifies  that  each  has  an  adjacent  point  that  is 
above  or  near  the  centerline.  This  causes  the  range  computation  between  these  sample 
points  and  their  corresponding  adjacent  points  to  be  large,  hence  the  corresponding  out-of 
control  points  on  the  mR  chart.  But  even  if  there  were  no  out-of-control  points  on  the  X 
chart,  if  on  the  X  chart  there  was  one  point  near  the  LCL  and  the  next  point  was  near  the 
UCL,  this  variability  should  show  up  as  a  out-of-control  point  on  the  mR  chart.  This  may 
indicate  that  the  variability  in  p-path  has  shifted  rather  than  the  mean  value.  Searching 
for  assignable  causes  for  shifts  in  variability  is  just  as  important  as  searching  for  causes  of 
shift  in  the  mean.  Out-of-control  variability  changes  fi'om  time  to  time  and  is  indicative 
of  inconsistency  and  instability  in  the  process  that  is  being  measured  (28:7). 

But  in  this  case,  the  network  seems  to  be  inherently  variable  in  relation  to  p-path, 
the  number  of  operating  paths.  Referring  to  the  case  study  network  in  Figure  3.4  and 
Table3.1,  it  is  noted  that  there  are  some  links  in  the  network  that  are  more  ‘important’ 
than  other  links;  important  meaning  that  more  paths  depend  on  this  link  to  operate.  For 
example,  if  link  number  13  (between  nodes  1 1  and  12)  fails,  all  paths  connecting  through 
nodes  13  to  22  and  25  to  37  will  be  down.  This  seems  to  make  link  number  13  a  ‘critical’ 
link  in  the  network  and  its  failure  in  conjunction  with  other  link  failures  could  be  exactly 
the  situation  that  is  causing  the  extremely  low  out-of-control  points  on  the  X  chart.  A 
search  for  an  assignable  cause  could  certainly  reveal  such  an  overt  problem. 
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Performance  Measure  Value 


Examining  the  data  from  the  simulation  reveals  that  during  the  one  hour  period 
before  Sample  13’s  collection  (the  hourly  data  collection  occurs  at  the  end  of  the  hour), 
the  following  ‘important’  links  failed:  9  (2  times),  11,13, 14,  18,  25  and  60.  Referring 
again  to  Figure  3.4  and  Table3.1,  any  combination  of  these  links  failing  together  could 
have  a  devastating  effect  on  the  number  of  paths  operating.  The  same  such  information 
surfaced  for  Samples  40  and  60,  and  all  three  of  the  one  hour  periods  preceding  the  out- 
of-control  samples  contained  a  failure  for  link  number  13,  a  critical  link  as  mentioned 
earlier.  So  if  such  a  drastic  condition  of  the  network  is  possible  through  the  failure  of  just 
one  link,  the  measure  p-path  will  be  highly  variable  with  wide  control  limits  as  seen  in 
the  X  chart.  But  back  now  to  the  p  chart  attempted  earlier. 

If  the  measure  p-path  is  showing  so  much  variability,  why  is  it  not  showing  up  in 
the  control  limits  on  the  p  chart?  Recall  the  conditions  necessary  for  a  binomial 
probability  model  to  be  appropriate  from  Section  3.1.3.  Although  the  measure  p-path 
seems  well  suited  to  monitoring  on  a  p  chart,  it  violates  an  assumption  of  the  binomial 
probability  model.  This  assumption  requires  that  p  is  the  probability  that  any  ‘unit’  will 
not  conform.  This  implies  that  the  value  of  p  is  the  same  for  all  n  ‘units’  in  a  sample. 

For  p-path  a  ‘unit’  is  a  path,  and  the  paths  are  not  necessarily  independent  from  one 
another.  This  is  so  since  each  path  may  contain  different  numbers  of  links  and  some 
paths  may  contain  common  links.  As  a  result,  p-path  is  not  a  proportion  based  on  a 
binomial  count  and  should  not  be  monitored  on  a  p  chart.  As  Wheeler  and  Chambers 
state.  If  the  binomial  probability  model  is  not  appropriate,  an  XmR  chart  should  be  used 
instead  of  a  p  chart  (28:260). 
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For  the  x-bar  and  R  charts,  the  standards  are  again  estimated  and  used  to  compute 
the  control  limits  below  (1  month  of  hourly  data  subgrouped  into  m  =  30  daily  samples, 
n  =  24,  was  used  for  estimation  and  plotting): 


Xi+X2+-"+X„ 

m 


=  0.6914 


—  R,  +^2 +•••+/?_ 

R  =  ^ ^  =  0.783 


m 


Xjaax  ^min 


The  control  limits  for  the  x-bar  chart  are: 

UCL  ^x  +  A2R  =  0.6914  +  0.157(0.7833)  =  0.8145 
CL  =  x  =  0.6914 

LCL  =  x-A2R=  0.6914  -  0.157(0.7833)  =  0.5682 


where  A2  =  0.1 57  for  n  =  24. 

The  control  limits  for  the  R  chart  are: 


—  R  To  7833 

UCL  =  R +3^3  — =  0.7833  +  3(0.712)|^^ 


CL  =  R=  0.7833 
LCL  =  R -3^3^  =  0.7833 

U') 


-3(0.712)(| 


3.895  J 
0.7833^ 


3.895  ; 


=  1.2129  = 


=  0.3533 


where  62  =  3.895  and  d3  =  0.712  for  n  =  24. 
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The  x-bar  and  R  charts  are  shown  in  Figure  4.15  and  Figure  4.16.  The  hourly  data 
is  subgrouped  into  daily  samples  of  size  n  =  24.  Applying  the  runs  rules  show  that  only 
one  point  (Sample  10)  on  the  x-bar  chart  and  no  points  on  the  R  chart  are  plotting  out-of¬ 
control.  Investigating  this  point  for  demonstration  purposes  only  (samples  were 
generated  from  the  in-control  simulation )  shows  that  Sample  lO’s  24  data  points  include 
5  values  of  0.6,  5  values  of  0.5, 1  value  of  0.4,  and  4  values  of  0. 1 .  With  an  LCL  value  of 
0.5682,  over  half  of  the  data  points  were  below  this  limit.  Once  again,  if  this  sample  had 
come  from  an  actual  network  that  was  not  known  to  be  in-control,  a  search  for  an 
assignable  cause  of  the  low  values  would  be  initiated.  The  high  variability  of  the 
individual  observed  p-path  points  seems  to  have  been  compensated  for  by  averaging  the 
data.  Both  the  data  points  and  the  control  limits  on  the  x-bar  chart  are  showing  less 
variability  than  the  X  chart,  and  there  are  no  out-of-control  points  on  the  R  chart  which  is 
designed  to  monitor  variance  in  the  data.  But  as  before,  when  a  count-based  measure 
(DwnLnk)  is  grouped  for  an  x-bar  chart ,  the  chart  is  now  representing  a  different  time 
interval  for  comparison.  Days  are  compared  now  instead  of  hours.  Therefore,  perhaps 
over  a  larger  time  interval,  the  high  variability  of  the  network  with  respect  to  p-path 
lessens. 

Noteworthy  is  the  correspondence  between  the  low  value  on  the  x-bar  chart  for  p- 
path  in  its  Sample  10  and  the  high  value  on  the  x-bar  chart  for  DwnLnk  in  its  Sample  10 
(see  Figure  4.7).  This  correspondence  conveys  the  influence  of  the  number  of  down  links 
on  the  proportion  of  paths  operating.  This  makes  intuitive  sense  since  a  higher 
number  of  down  links  would  be  expected  to  lower  the  proportion  of  operating  paths. 

Note  also  though  in  comparing  Figure  4.5  with  Figure  4.13,  that  the  same  correspondence 
is  not  present  in  the  X  charts  of  the  individual  data  points.  The  relationship  between  the 
two  measures,  DwnLnk  and  p-path,  seems  to  be  more  indirect;  detectable  only  in  the 
average  of  the  samples  over  a  longer  period  of  time  than  in  the  data  points  themselves. 
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p-path  1hr 


Figure  4.15  x-bar  Chart  for  p-path  (1/hr  collection  rate  -  daily  samples) 


Both  the  XmR  charts  and  the  x-bar  and  R  charts  seem  viable  for  monitoring 
p-path.  As  indicated  for  DwnLnk,  the  choice  between  these  two  types  of  charts  will  be 
based  on  the  desired  rate  of  detection  for  out-of-control  conditions.  Another 
consideration  in  this  choice  is  the  detection  of  out-of-control  points  when  aggregating 
data  on  the  x-bar  and  R  charts  that  did  not  appear  on  the  XmR  charts.  The  x-bar  and  R 
charts  could  be  more  sensitive  to  shifts  in  the  measure  p-path  than  the  XmR  charts  due  to 
the  larger  subgrouping  (n=24  compared  to  n=l)  of  the  data  (28:157).  However, 
Montgomery  states  that  smaller  samples  taken  more  frequently  (n=l  every  hovir)  and 
larger  samples  taken  less  frequently  (n=24  every  24  hours)  are  comparable  for  detecting 
shifts  in  the  same  amoimt  of  time  (18:11 1-13).  This  is  true  if,  using  the  example  sizes 
given,  the  earliest  a  detection  is  desired  is  in  1  day.  If  a  shift  detection  is  desired  in  a  few 
hours,  the  daily  grouping  is  not  satisfactory.  So  in  this  case,  user  preference  is  the 
ultimate  judge. 

4.2.4  Degradation  Monitoring. 

Once  again,  degradation  will  be  monitored  indirectly  through  the  observation  of 
another  performance  measure,  p-path.  The  runs  rules  allow  p-path  to  indirectly  measure 
network  degradation  by  monitoring  for  decaying  conditions,  trends,  or  abrupt  shifts  just 
as  they  did  for  the  overall  performance  measures  in  Section  4.1 .  Assuming  that 
degradation  of  the  network  reveals  itself  as  more  links  failing  over  time  or  more  failxues 
in  a  specific  link  that  is  contained  in  more  than  one  path,  hence  less  paths  operating  over 
time,  a  decrease  in  p-path  is  an  indication  of  degradation  in  the  network. 

4.2.4.1  Degradation  Case  Study.  Now  the  (s-t)  viewpoint  will  be 
investigated  for  detection  of  link  number  13’s  degradation.  The  data  is  plotted  on  XmR 
charts  for  hoiuly  p-path  shown  in  Figure  4.17  and  Figure  4.18  .  Link  13  is  abruptly 
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mRange  Performance  Measure  Value 


Figure  4.17  X  Chart  for  Hourly  p-path  (Link  13  degraded) 


Figure  4.18  mR  Chart  for  Hourly  p-path  (Link  13  degraded) 
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degraded  in  Sample  25  as  before.  Looking  at  the  X  chart  in  Figure  4.17,  a  increase  in  the 
amoxmt  of  out-of-control  points  can  be  seen  after  Sample  25.  This  measure  is  inherently 
variable  to  begin  with,  but  it  seems  even  more  so  after  the  degradation  takes  place.  Even 
with  the  wide  control  limits  from  the  high  variability,  points  are  plotting  out-of-control 
more  often  after  Sample  25  than  before.  Therefore  the  X  chart  has  detected  the 
degradation.  From  the  mR  chart  in  Figure  4.18,  an  increase  in  the  variability  can  also  be 
seen  after  Sample  25.  This  is  a  concurrent  indication  of  some  type  of  degradation.  If  the 
assignable  cause  were  not  known  for  this  out-of-control  indication,  a  search  for  one 
should  be  initiated.  Due  to  the  high  variability  though,  it  may  not  be  obvious  that  the 
assignable  cause  began  at  Sample  25.  A  search  at  Sample  31  (the  first  out-of-control 
point  where  the  out-of-control  points’  frequency  increases)  would  probably  be  chosen  as 
a  starting  point.  Hence,  p-path  is  able  to  detect  the  degradation  even  though  the  high 
variability  in  this  measure  seems  to  mask  it. 

4.3  Individual  Link  Performance  Measures 

Many  measures  were  identified  as  potential  indicators  of  individual  link 
performance.  They  are  link  availability  (Availability),  proportion  of  times  a  link  is 
operating  when  it  is  checked  at  regular  intervals  (p-link).  Time  to  Failure  (TTF),  Time  to 
Repair  (TTR),  Time  Between  Failures  (TBF),  Mean  Time  to  Failure  (MTTF),  Mean  Time 
to  Repair  (MTTR),  Mean  Time  Between  Failures  (MTBF),  and  link  steady-state 
availability  (SSA).  Theoretical  considerations  of  these  measures  will  be  discussed, 
followed  by  a  demonstration  of  each  measure  and  a  discussion  of  degradation  monitoring 
using  these  measures. 
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4.3.1  Theoretical  Considerations. 

The  definition  used  by  the  measure  Availability  is  the  proportion  of  time  that  a 
link  is  operating  over  an  interval  of  time.  This  is  a  short-term  version  of  the  steady-state 
availability  measure  SSA  which  is  defined  as  the  long-run  proportion  of  time  that  a  link  is 
operating.  The  theoretical  value  for  SSA  is  computed  via: 


SSA  = 


MTTF 

MTTF+MTTR 


12259 

12259  +  754 


=  0.942 


using  the  theoretical  values  for  MTTF  and  MTTR  assumed  for  an  individual  link.  MTTR 
was  estimated  from  summary  data  provided  by  the  sponsor  (MTTR=754  seconds)  along 
with  the  MTTF  over  all  links  (MTTF_overall=169).  As  mentioned  in  Section  3.2.3,  the 
steady  state  equations  for  the  M/M/s  queueing  model  are  solved  in  Appendix  D  to  obtain 
the  individual  link  MTTF’s.  Recall  from  Section  3.2.1  that  the  MTTF  and  MTTR 
distributions  are  assumed  to  be  iid  for  all  links,  in  which  case  the  same  theoretical  SSA 
value  can  be  used  for  all  links.  This  value  for  SSA  will  be  used  as  the  mean  (p)  for  both 
Availability  and  SSA.  Their  standard  deviations  (a),  however  will  be  estimated  from  the 
data  since  the  theoretical  distribution  of  these  availability  measures  is  unknown.  The 
estimated  standards  are  (1  week  of  hourly  data): 


X  = 


X,  +Xt+-"+X„ 

— - - - -  =  0.9404 


m 


m-1 


mR  =  J^mRi  =0.0925 


M 


The  estimated  mean  x-bar  is  quite  close  to  the  theoretical  mean  described  above  (0.9421), 
hence  the  estimated  mean  will  be  used  to  show  that  estimated  standards  work  just  as  well 
as  the  theoretical  standards.  The  corresponding  control  limits  are: 
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X  chart  for  hourly  Availability: 


UCL  =  x+  2.66M  =  0.9404  +  2.66(0.0925)  =  1.1865  =  1 
CL  =  3c  =  0.9404 

LCL  =  x-  2.66mR  =  0.9404  -  2.66(0.0925)  =  0.6944 

mR  chart  for  hourly  Availability: 

UCL  =  =  0.0925(3.267)  =  0.3022 

CL  =  ^  =  0.0925 
LCL  =  mRD^  =  0  always 

The  estimated  standards  for  daily  Availability  are  (1  month  of  daily  data): 


X  = 


-! - ? - 2-  =  0.9416 


m 


m-\ 


/ni?  =  wR,  =0.0344 


/«1 


X  chart  for  daily  Availability: 

UCL  =  x+  2.66M  =  0.9416  +  2.66(0.0344)  =  1.0331  s  1 
CL  =  X  =  0.9416 

LCL  =  X  -  2.66mR  =  0.9416  -  2.66(0.0344)  =  0.8501 
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mR  chart  for  daily  Availability: 


UCL  =  mRD^  =  0.0344(3.267)  =  0.1124 
CL  =  ^  =  0.0344 
LCL  =  mRD^  -  0  always 


The  estimated  standards  for  hourly  Availability  grouped  into  daily  samples  are  (1  week  of 
hourly  data): 


X  = 


Xi+X2+---+X„ 

m 


0.9397 


-jr  -^1  +  Ry  ’  "^Rm 

R  =  -! - ^ ^ 

m 


0.3615 


The  control  limits  for  the  x-bar  chart  are: 


UCL  =  X  +  ^2  R  =  0.69 1 4  +  0.1 57(0.7833)  =  0.8145 
CL  =  X  =  0.6914 

LCL  =  x-A^R  =  0.6914-0.157(0.7833)  =  0.5682 

where  A2  =  0.157  for  n  =  24. 
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The  control  limits  for  the  R  chart  are: 


UCL  =  i?+ 3^/3^  =  0.7833  + 
“2 


3(0.712)[^ 


0.7833 


CL  =  R  =  0.7833 


LCL  =  i?- 3^/3  — =  0.7833 

di 


-3(0.712)(^^ 


3.895  ) 
0.7833^ 


3.895  ) 


=  1.2129  = 


=  0.3533 


where  ^2  ~  3.895  and  d3  =  0.712  for  n  =  24. 


The  proportion  p-link  is  another  measure  based  on  a  binomial  count.  This 
measure,  p-link,  is  similar  to  the  measure  p-up  since  it  is  also  checking  for  operating  links 
at  300  second  time  intervals.  In  contrast  though,  a  check  is  made  of  a  link’s  status  every 
300  seconds  and  then  these  checks  are  aggregated  to  produce  a  count  over  an  hourly  and 
a  daily  interval.  (P-up  has  a  sample  size  of  77  since  it  checks  all  links  every  300  seconds 
instead  of  just  one  link).  The  hourly  interval  contains  12  checks  per  sample  and  the  daily 
interval  contains  288  checks  per  sample.  The  data  is  being  grouped  since  only  one  link  is 
being  checked.  If  each  of  these  checks  were  plotted  individually,  all  that  would  be 
plotted  would  be  ones  (link  is  up)  or  zeroes  (link  is  down).  Hence,  the  data  must  be 
grouped,  so  hourly  and  daily  samples  seem  to  be  a  natural  grouping.  Similar  to  the 
earlier  proportion  measures  p-up  and  p-down,  the  value  of  p  has  remained  the  same  (p  = 
4.46/77)  since  it  still  represents  the  probability  that,  but  the  value  of  n  has  changed  (n  = 

12  hourly,  n  =  288  daily)  for  this  binomial  distribution..  The  binomial  model  can  be  used 
for  this  proportion,  p-link,  since  each  link’s  subsequent  failures  are  independent  of  each 
other  just  as  each  link’s  subsequent  times-to-failure  are  independent  of  each  other.  Hence 
the  probability  that  a  link  is  down  at  some  arbitrary  point  in  time,  p,  is  the  same  at  each 
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check  of  the  link.  Therefore,  the  mean  (|i)  and  standard  deviation  (ct)  for  p-link  are 
defined  and  are  used  to  compute  the  following  control  limits; 


p  chart  for  hourly  p-link: 


p  chart  for  daily  p-link: 


Hence,  with  these  standards  known  for  p-link  (for  the  case  study  network  model),  they  do 
not  need  to  be  estimated  from  the  sample  data. 

TTF,  TTR,  and  TBF  all  come  from  known  distributions  (exponential)  with 
standards  provided  by  the  sponsor  as  mentioned  earlier  in  this  section.  The  sample 
MTTF,  MTTR,  and  MTBF,  since  they  are  an  average  of  their  respective  failure/repair 
times,  can  then  infer  their  standards  from  Aose  of  their  respective  failure/repair  times. 
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These  theoretical  means  of  the  exponential  distributions  of  the  sample  MTTF  and  sample 
MTTR  were  programmed  into  the  simulation  model  and  can  therefore  be  used  as 
theoretical  values  for  the  control  charts.  The  individual  link  TBF  (and  sample  MTBF)  is 
calculated  as  the  sum  of  TTF  (sample  MTTF)  and  TTR  (sample  MTTR).  These  values 
for  MTTF,  MTTR,  and  MTBF  are  used  as  the  means  of  their  respective  charts  as  well  as 
the  means  for  their  corresponding  individual  value  charts  (TTF,  TTR,  and  TBF).  Also, 
since  the  distributions  for  the  measures  TTF,  TTR,  and  TBF  are  assumed  to  be 
exponential,  the  standard  deviations  for  these  measure  are  equal  to  their  respective  means 
while  the  standard  deviations  for  the  measures  MTTF,  MTTR,  and  MTBF  are  equal  to 
their  respective  means  divided  by  m  =  the  number  of  individual  times  used  in  the 
average.  These  values  are: 


Maotf  “  Mttf  “  12259 
(J  jYf  “  12259 
^  MTTF  ”  12259/  /w 


f^MTTR  =  Mm  = 
a-TTR^  254 

^  MTTR  -  tn 


^MTBF  -  f^TBF  “  13013 
Cj-Bp  =  13013 
^  MTBF  ~  13013  /  W 


Thus,  the  standards  for  all  six  of  these  performance  measures’  distributions  are  known 
when  m  is  determined  (for  the  case  study  network  model)  and  do  not  need  to  be  estimated 
from  the  sample  data.  The  corresponding  control  limits  for  TTF,  TTR,  and  TBF  are 
calculated  below: 


X  chart  for  TTF: 


mR 


UCL  =  ^  +  3[^— J  =  12259  +  3(12259)  =  49036 
CL  =  /t  =  12259 

LCL  =  —1  =  12259  -  3(1 2259)  =  -245 18  =  0 
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where  for  n  =  2: 


mR  =  ad2 
=  1.128 


mR  chart  for  TTF : 


UCL  =  mRD,  =  12259(1.128X3.267)  =  45176.6 
CL  =  ^  =  13828.2 

LCL  =  mRD^  =  0  always  (usually  not  annotated  on  chart) 


where  for  n  ==  2: 


£>3=0 

£>4  =  3.267 


X  chart  for  TTR: 


UCL  =  ^  +  3  —  =754 +  3(754)  =  3016 

\  d-,  J 


Kd^) 


CL  =  ju  =  754 


CmR^ 

LCL  =  ^-3  -  =  754 -3(754)=-!  508  sO 

\  ^2  ' 


mR  chart  for  TTR: 

UCL  =  ffj^4  =  754(1.1 28X3.267)  =  2778.6 
CL  =  ^  =  850.5 

LCL  =  mRD^  =  0  always  (usually  not  annotated  on  chart) 


X  chart  for  TBF: 


(~r\ 

UCL  =  /x  +  3  —  =13013  +  3(13013)=  52052 
\  d^  J 


mR 

~d^) 

CL  =  ^  =  13013 


(InR^ 

LCL  =  ^-3  —  =13013 -3(13013)  =-26026^0 

yd^  ) 
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mR  chart  for  TBF : 


UCL  =  mRD^  =  13013(1.128X3.267)  =  47955.2 
CL  =  1^{R  =  14678.6 

LCL  =  mRD^  =  0  always  (usually  not  annotated  on  chart) 

4.3.2  Autocorrelation  of  Data  Points. 

All  autocorrelation  computations  are  calculated  using  data  from  Link  #1  for 
demonstration  purposes.  The  measure  p-link  is  the  only  measure  in  this  section  collected 
every  300  seconds,  but  it  is  then  aggregated  to  produce  a  coimt  over  an  hourly  and  a  daily 
interval.  Thus,  the  data  points  collected  once  every  hour  and  once  every  day  are 
investigated  to  ensure  no  auto-correlation.  The  resulting  autocorrelation  at  lag  1  and  its 
95%  standard  error  limits  are  shown  in  Table  4.3  for  one  month  of  data. 


Table  4.3  Autocorrelation  for  p-link 


Collection  Rate 

ri 

Limits 

Sample  Size 

Autocorrelation 

1/hour 

0.023 

-0.073<r, <0.073 

720 

no  significant 

1/day 

0.246 

-0.358<ri<0.358 

30 

no  significant 

Link  Availability  is  checked  next  for  autocorrelation.  This  measure  is  computed  over  an 
interval  of  one  hour  and  also  one  day.  The  results  are  shown  in  Table  4.4. 


Table  4.4  Autocorrelation  of  Availability 


Collection  Rate 

ri 

Limits 

Sample  Size 

Autocorrelation 

1/hour 

0.109 

-0.073<r, <0.073 

720 

slight 

1/day 

-0.110 

-0.358<ri<0.358 

30 

no  significant 
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The  next  measures  checked  are  TTF,  TTR,  TBF,  and  sample  MTTF,  MTTR,  and  MTBF. 
These  measures  are  all  collected  after  each  failure  and  repair  of  a  link  occur.  Since  the 
link  failures  are  assumed  to  be  independent  in  Chapter  1,  TTF,  TTR,  and  TBF  should  not 
be  autocorrelated.  Their  autocorrelations  at  lag  1  agree  with  this  assumption  with  values 
-0.01 1,  -0.049,  and  -0.01 1  respectively.  In  contrast,  if  the  sample  MTTF,  MTTR,  and 
MTBF  are  cumulative  averages  of  all  past  TTFs,  TTRs,  and  TBFs  respectively,  then  the 
sample  MTTF,  MTTR,  and  MTBF  are  all  expected  to  be  correlated.  In  addition,  they  are 
all  expected  to  converge  to  their  respective  theoretical  means.  Their  corresponding 
autocorrelations  at  lag  1  are  indeed  high  at  0.880, 0.609,  and  0.875.  This  autocorrelation 
can  easily  be  seen  in  an  time  series  plot  chart  for  the  cumulative  sample  MTTF  of  Link 
#1  shown  in  Figure  4.19. 


T-I^COOlOT-I^COOJlOT-t^CO 

T-T-C>JCOCO^^«5tOtOt^ 


Sample  Number 

Figure  4.19  Time  Series  Plot  of  Cumulative  MTTF  Link  #1 
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As  indicated  earlier  in  Section  4.1.2,  autocorrelated  data  cannot  be  used  on  the 
standard  control  charts  being  investigated  in  this  study.  Therefore,  the  measures  sample 
MTTF,  MTTR,  and  MTBF  should  not  be  cumulative.  Instead  they  should  be  calculated 
over  a  specified  time  period.  The  choice  of  what  this  time  period  should  be  can  be  based 
on  the  theoretical  mean  of  the  measure.  The  time  interval  should  be  large  enough  so  as  to 
allow  enough  failures  to  occur  for  an  accurate  calculation.  The  individual  link  failure  rate 
for  the  case  study  network  is  1  failure/12259  seconds  (or  1  failure/3.4  hours).  Therefore, 
an  hourly  collection  rate  would  not  be  appropriate.  A  daily  collection  rate  would  be 
much  better  and  is  investigated  here  for  autocorrelation  using  Link  #rs  daily  MTTF  as 
an  example.  A  time  series  plot  of  Link  #rs  daily  MTTF  is  shown  in  Figure  4.20.  This 
plot  does  not  seem  to  be  showing  significant  autocorrelation  (confirmed  by  an 
autocorrelation  at  lag  1  value  of  -0.338  using  a  sample  size  of  30),  but  it  is  definitely 
converging  toward  its  theoretical  mean.  With  this  kind  of  convergence  in  these  measures 
(MTTF,  MTBF,  and  MTTR),  charting  them  does  not  seem  to  provide  any  useful 
information.  Their  corresponding  individual  measurements  TTF,  TBF  and  TTR  are  still 
viable  though  and  will  be  used  to  monitor  the  network. 
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Daily  MTTF  Link1 


_ « _ Value 

.  .  .  .CL(X) 


Figure  4.20  Time  Series  Plot  of  Daily  MTTF  Link  #1 


A  direct  consequence  of  the  convergence  of  MTTF  and  MTTR  data  is  that  the 
SSA  data  will  also  converge  (as  it  should  since  it  is  a  steady-state  measure)  since  it  is 
computed  directly  from  these  two  previous  measures.  It  is  also  autocorrelated  as  shown 
by  an  autocorrelation  at  lag  1  value  of  0.587.  A  time  series  plot  of  the  SSA  data  also 
shows  this  autocorrelation  and  a  convergence  toward  1  as  the  time  interval  increases.  The 
time  series  plot  is  shown  in  Figure  4.21. 
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Figure  4.21  Time  Series  Plot  for  SSA  Link  #1  (Steady  State  Availability) 


4.3.3  Demonstration  of  Procedures. 

The  charts  identified  in  Chapter  3  for  monitoring  the  remaining  performance 
measures  will  be  demonstrated  here.  Each  chart  in  this  section  is  shown  with  its  control 
limits  (UCL,  CL,  and  LCL)  and  1 -sigma  and  2-sigma  warning  limits.  All  limits  in  this 
section  are  computed  using  the  respective  theoretical  and  estimated  standards  described 
in  Section  4.3.1. 

4.3.3.1  Link  Availability.  The  measure  Availability  has  two  possible 
control  charting  techniques;  XmR  charts  and  x-bar  and  R  charts.  This  measure  is 
computed  over  both  a  1  hour  interval  and  a  1  day  interval.  The  hourly  XmR  charts  are 
shown  in  Figure  4.22  and  Figure  4.23  and  the  daily  XmR  charts  are  shown  in  Figure  4.36 
and  Figure  4.37. 
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mRange  Performance  Measure  Value 


Hourly  Availability  Link  #1 


Figure  4.23  mR  Chart  for  Availability  (computed  &  collected  hourly) 


mRange  Performance  Measure  Value 


Applying  the  runs  rules  show  that  many  points  are  plotting  out-of-  control  on  both 
hourly  charts  in  Figure  4.22  and  Figure  4.23  (the  runs  at  values  1  and  0  are  painfully 
obvious).  Since  it  is  known  that  there  are  no  assignable  causes,  something  else  must  be 
causing  these  out-of-control  points.  The  cause  is  the  failure  rate  of  an  individual  link  (1 
failure/3.4  hours)  compared  to  the  collection  rate  (1/hour).  Not  enough  failures  are 
occurring  in  an  hour’s  time  to  compute  an  accurate  value  of  Availability.  Hence,  there 
are  many  values  of  1  plotted  on  the  UCL  of  the  X  chart  which,  in  turn,  cause  the  many 
values  of  0  plotted  on  the  LCL  of  the  mR  chart.  Looking  now  at  the  daily  charts  in  Figure 
4.36  and  Figure  4.37,  no  points  are  plotting  out-of-control  indicating  an  in-control 
network  as  it  should.  As  a  result.  Availability  should  be  computed  over  an  interval  of  at 
least  one  day  with  the  current  individual  link  failure  rate. 

The  x-bar  and  R  charts  are  shown  in  Figure  4.26  and  Figure  4.27.  The  hourly  data 
are  subgrouped  into  daily  samples  of  size  n  =  24.  Applying  the  runs  rules  to  these  two 
charts  show  that  several  points  on  both  charts  are  plotting  out-of-control.  Some  of  these 
out-of-control  points  are  clearly  outside  the  control  limits  on  the  x-bar  chart  in  Figure 
4.26  (i.e.,  samples  2,  5,  6,  8, 12,  17, 18,  21,  28,  and  29),  while  still  other  sample  points 
are  classified  as  out-of-control  due  to  a  run  containing  them  (i.e.,  in  Figure  4.26,  Samples 
28  and  29  are  a  2-of-3  run  and,  in  Figure  4.27,  Samples  12,  13,  and  14  are  a  3-of-3  run 
and  Samples  20,  21, 22,  and  24  are  a  4-of-5  run).  Remembering  the  previous  finding  from 
the  XmR  charts  for  hourly  Availability  in  Figure  4.22  and  Figure  4.23,  that  Availability 
should  be  computed  over  an  interval  of  at  least  one  day  with  the  current  individual  link 
failure  rate,  the  daily  aggregation  of  hourly  calculations  is  not  appropriate.  A  weekly  or 
monthly  grouping  of  daily  availability  would  be  more  worthwhile. 
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Sample  Xbar 


1 


Hourly  Availability  Link  #1  (daily  samples) 


»  Sample  )45ar 
UCL()Q 
.  .  -  .CL(X) 

LCL()Q 

_ UJSIGMA(X) 

_ LJSIGMAW 

. U_2SIGI\/IA(X) 

. L_2SIGMA(X) 


Sample  Number 


Figure  4.26  x-bar  Chart  for  Availability  (computed/collected  hourly  -daily  samples) 


Hourly  Availability  Link  #1  (daily  samples) 


o 

c» 

c 

& 


^  Sample  R 

. UCL(R) 

_ CL(R) 

.  LCL(R) 

_ U_1SIGMA(R)  I 

_ L_1SIGMA(R) 

. U_2SIGMA(R)| 

. L_2SIGMA(R) 


Sample  Number 


Figure  4.27  R  Chart  for  Availability  (computed/collected  hourly  -  daily  samples) 
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4.3.3.2  Proportion  of  Link  Up-checks.  The  measure  p-link  has  one 
possible  control  charting  technique;  the  p  chart.  This  measure  is  computed  over  both  a  1 
hour  interval  and  a  1  day  interval.  The  hourly  p  chart  is  shown  in  Figure  4.28  and  the 
daily  p  chart  is  shown  in  Figure  4.29.  Applying  the  runs  rules  show  that  many  points  are 
plotting  out-of-control  on  the  hourly  chart.  Once  again,  this  is  due  to  the  fact  that  the 
failure  rate  of  an  individual  link  (1  failure/3 .4  hours)  is  much  larger  that  the  collection  rate 
here  (1/300  seconds).  Not  enough  failmes  are  occiuring  every  300  seconds  to  compute 
an  accurate  value  of  this  proportion,  p-link.  Too  many  values  of  1  are  being  computed. 
As  with  Availability,  the  data  should  be  collected  on  a  less  frequent  basis,  and  the 
proportion  should  be  computed  over  an  interval  of  one  day  with  the  current  individual 
link  failure  rate.  Looking  now  at  the  daily  chart,  in  Figure  4.29,  there  are  still  several  out- 
of-control  points  and  two  ‘rUns’  above  the  2-sigma  limit.  The  same  problem  still  exists 
as  for  Availability;  not  enough  failures  are  occurring  every  300  seconds.  Computing  the 
proportion  over  an  day’s  interval  rather  than  a  hour’s  interval  will  not  fix  this  problem. 
The  every  300  second  data  is  still  being  used.  If  a  larger  time  interval  were  used  for  data 
collection,  then  the  p  chart  would  be  well  suited  to  monitor  the  performance  measure 
p-link  since  this  measure  fits  the  conditions  for  the  binomial  probability  model  in  Section 
3.1.3. 
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Sample  p 


Hourly  p-link  Link  #1 


Figure  4.28  p  Chart  for  p-link  (collected  l/300sec  -  computed  hourly) 


4.3.3.3  Time  to  Failure/Time  to  Repair/Time  Between  Failures.  The 
measures  TTF,  TTR,  and  TBF  have  one  possible  control  charting  technique;  XmR 
charts.  The  XmR  charts  for  TTF  are  shown  in  Figure  4.30  and  Figure  4.3 1,  the  XmR 
charts  for  TTR  are  shown  in  Figure  4.32  and  Figure  4.33,  and  the  XmR  charts  for  TBF 
are  shown  in  Figure  4.34  and  Figure  4.35.  Applying  the  runs  rules  show  that  only  one 
point  is  plotting  out-of-control  (Sample  24  -  the  24th  failure)  on  both  the  X  chart  for  TTF 
and  the  X  chart  for  TBF.  Since  TBF  is  the  sum  of  TTF  and  TTR,  a  correspondence 
between  TTF  and  TBF  is  expected.  This  single  out-of-control  point  is  due  to  a  chance 
occurrence  of  the  draw  from  the  exponential  distributions  included  in  the  simulation  since 
the  network  is  known  to  be  in  control. 

The  monitoring  of  these  three  performance  measures  will  complement  each  other 
well.  An  out-of-control  point  on  a  TBF  chart  should  be  accompanied  by  an  out-of- 
control  point  on  either  the  TTF  or  TTR  charts.  Care  must  be  taken  though  since  the  TTF 
may  dominate  over  the  TTR  if  its  mean  value  is  significantly  larger  than  TTR’s  (as  in  this 
case  study).  However,  an  out-of-control  point  on  the  TBF  chart  with  a  corresponding 
out-of-control  point  on  the  TTF  chart  should  not  substantiate  automatically  disregarding 
the  TTR  chart.  Both  TTF  and  TTR  may  be  contributing.  Since  an  out-of-control  point  on 
the  TBF  charts  should  be  cross  checked  with  both  the  TTF  charts  and  the  TTR  charts  to 
determine  which  is  contributing  to  the  out-of-control  condition,  the  TBF  charts  are  like  an 
aggregate  of  the  measures  TTF  and  TTR.  If  only  one  measure  is  desired  to  be  monitored, 
the  TBF  should  be  monitored  to  detect  shifts  from  both  TTF  and  TTR.  But  if  two 
measures  can  be  monitored,  TTF  and  TTR  are  sufficient  without  the  additional  charting 
of  TBF.  These  measures  will  be  particularly  useful  in  helping  to  identify  the  cause  of 
degradation  discussed  in  the  next  section. 
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mRange  Performance  Measure  Value 
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Figure  4.34  X  Chart  for  TBF  Link  #1 


TBF  Linki 


120000 


o 

a> 

c 

& 

E 


m  tN  o> 
T-  C>4  04 


Sample  Number 


»  m  Range 


UCL(mR) 

.  .  .  -CL(mR) 

_ U_1SIGMA(mR) 

_ LJSIGMA(mR) 

. U_2SIGMA(mR) 

. L_2SIGMA(mR) 


Figure  4.35  mR  Chart  for  TBF  Link  #1 
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One  additional  concern  with  these  measures  is  that  TTF  and  TTR  are  known  to 
have  exponential  distributions.  Even  though  data  from  any  distribution  can  be  plotted  on 
control  charts,  knowing  the  data’s  imderlying  distribution  will  indicate  the  behavior  of  the 
data  on  the  control  chart.  Control  charts  are  designed  with  the  normal  distribution  in 
mind,  and  hence  the  control  limits  are  computed  for  normally  distributed  data. 

Comparing  the  exponential  distribution’s  relationship  to  the  control  limits  to  the  normal 
distribution’s  relation  ship  shows  a  moderate  difference  in  relation  to  the  center  line  and 
the  1 -sigma  limits.  For  the  exponential  distribution,  approximately  62  %  of  the  points 
should  plot  below  the  center  line  as  compared  to  50  %  of  the  points  for  the  normal 
distribution.  Also  for  the  exponential  distribution,  98.2  %  of  the  points  should  plot 
within  the  3-sigma  limits  as  compared  to  97.7  %  for  the  normal  distribution  (28:60-4). 
Hence,  knowing  the  distribution  of  the  data  could  help  in  identifying  a  pattern  due  to  the 
underlying  distribution  instead  of  an  assignable  cause  as  might  be  indicated  by  the  runs 
rules.  This  knowledge  of  the  distribution  could  thus  prevent  a  false  alarm  of  an  out-of- 
control  condition. 

4.3.4  Degradation  Monitoring. 

Once  again,  network  degradation  will  be  monitored  indirectly  through  the 
observation  of  other  performance  measures,  and  the  runs  rules  allow  this  indirect 
monitoring  by  detecting  decaying  conditions  or  trends  just  as  they  did  for  the  overall 
performance  measures  and  (s-t)  performance  measure.  Assuming  that  degradation  of  the 
network  reveals  itself  as  more  links  failing  over  time,  shorter  TTFs  and/or  longer  TTRs 
over  time,  trends  can  be  monitored  with  the  performance  measures  p-link.  Availability, 
TTF,  TTR,  and  TBF.  A  slow  decrease  of  p-link  is  an  indication  of  degradation  in  the 
network  due  to  the  first  assumption  that  more  links  are  failing  over  time  in  a  degrading 
network.  Availability  will  be  affected  by  a  change  in  TTF.  Shorter  TTFs  (less  uptime) 
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will  cause  a  decrease  in  Availability.  Finally,  by  monitoring  all  three  TTF,  TTR,  and 
TBF  charts  for  any  trends  (runs),  location  of  the  cause  will  be  facilitated.  A  combination 
of  TBF  decreases  and  TTR  increases  point  to  example  problems  with  either  the  repair 
facilities  or  the  magnitude  of  the  link’s  failure.  A  combination  of  TBF  decreases  and  TTF 
decreases  point  to  example  problems  such  as  high  network  loading,  low  quality  of 
repairs,  equipment  wearing  out.  These  indications  from  all  three  measures  could  be  any 
combination  of  these  example  problems  mentioned.  Degradation  can  be  easily  detected 
in  this  manner. 


4.3.4.1  Degradation  Case  Study.  Now  the  degradation  of  Link  13  will 
be  monitored  through  individual  link  performance  measures.  First,  the  daily  Availability 
of  Link  13  is  plotted  on  XmR  charts  and  shown  in  Figure  4.36  and  Figure  4.37.  The 
degradation  occurred  in  Sample  16  for  this  demonstration  (a  daily  measure  is  now  being 
used  as  compared  to  an  hourly  measure  for  p-down  and  p-path).  Looking  at  the  X  chart 
in  Figure  4.36,  the  abrupt  decrease  in  the  link’s  availability  is  easily  seen  in  Sample  16. 
These  is  also  a  corresponding  out-of-control  indication  on  the  mR  chart  in  Figure  4.37.  If 
the  change  were  not  so  abrupt  though,  runs  rules  would  be  used  to  indicate  an  out-of- 
control  condition.  Next,  the  TTF’s  for  Link  13  are  plotted  on  the  XmR  charts  shown  in 
Figure  4.38  and  Figure  4.39.  The  degradation  occurred  after  Sample  30  (the  30th  failure  - 
this  measure’s  samples  are  for  each  failure,  not  a  set  interval).  The  X  chart  in  Figure  4.38 
clearly  shows  the  abrupt  decrease  in  Link  13’s  TTF  after  Sample  30.  The  mR  chart  in 
Figure  4.39  corresponds  to  this  abrupt  shift  as  it  should.  As  stated  earlier,  if  the  shift 
were  not  so  abrupt,  a  gradual  decrease  in  TTF  would  be  seen  and  the  runs  rules  would 
help  to  detect  the  degradation.  Both  the  measures  of  Availability  and  TTF  for  Link  13 
easily  detected  the  degradation  of  Link  13. 
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Figure  4.38  X  Chart  for  TTF  Link  13  (degraded) 
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4.4  Level  of  Service  (LOS)  Agreements 

In  establishing  specifications  for  a  LOS  Agreement  using  control  charts,  there  is  a 
very  important  point  regarding  the  typical  relationship  between  specification  limits  and 
control  limits  as  stated  by  Montgomery:  “there  is  no  connection  or  relationship 
[mathematical  or  statistical]  between  the  control  limits  on  the  ...  charts  and  the 
specification  limits  of  the  process”  (18:213).  Control  limits  are  based  on  the  natural 
variability  of  a  process,  while  specification  limits  are  determined  external  from  the 
process  (i.e.,  by  management  or  by  a  customer)  (1 8:213).  Instead,  the  natural  variability 
of  a  process  defines  the  natural  tolerance  limits  of  the  process.  These  limits  are  located 
3  a  above  and  3a  below  the  process  mean.  So  while  the  control  and  specification  limits 
are  not  related,  it  is  helpful  to  know  the  inherent  process  variability  before  setting 
specification  limits  (18:213-14).  The  empirical  rule  states  that:  “Given  a  homogeneous 
set  of  data: 

1 .  Roughly  60%  to  75%  of  the  data  will  be  located  within  1  a  unit  on  either  side 
of  the  average. 

2.  Usually  90%  to  98%  of  the  data  will  be  located  within  2  a  units  on  either  side 
of  the  average. 

3.  Approximately  99%  to  100%  of  the  data  will  be  located  within  3  a  units  on 
either  side  of  the  average.”  (28:61) 

where  a  ‘sigma  unit’  is  equal  to  the  standard  deviation  of  the  data.  Therefore,  it  is  easily 
seen  that  the  natmal  tolerance  limits  of  a  process  contain  approximately  99%  to  100%  of 
the  data  from  that  process. 

Using  this  information,  when  setting  the  specification  limits  on  a  performance 
measure,  the  mean  and  standard  deviation  (the  standards)  of  this  measure  should  be 
known.  Therefore,  if  the  theoretical  standards  are  known,  they  should  be  used  to  find  the 
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natural  tolerance  limits  of  the  performance  measure;  whereas  if  the  theoretical  standards 
are  unknown,  they  must  be  estimated  from  the  data  in  order  to  determine  the  natural 
tolerance  limits.  Once  the  natural  tolerance  limits  are  known  it  is  recommended  that 
both  the  upper  and  lower  the  specification  limits  exceed  3a  units  from  the  mean.  It  this  is 
accomplished,  then  virtually  all  of  the  performance  measures  will  fall  within  the 
specification  limits  as  long  as  it  stays  reasonably  in-control  (28: 124).  If  desired,  the 
specification  limits  may  be  plotted  on  a  chart  of  individual  measurements  (i.e.,  X  chart  or 
p  chart)  but  not  on  a  chart  of  averages  (x-bar  chart)  (18:214).  The  specification  limits  are 
not  limits  for  the  average  of  the  performance  measures,  they  are  limits  for  individual 
values.  Hence,  the  LOS  Agreement  specifications  for  each  performance  measure,  which 
can  be  determined  by  comparison  to  the  natural  process  limits  (+  or  -  3a  from  the  mean), 
can  be  plotted  on  a  chart  containing  the  individual  values  of  that  performance  measure. 
This  facilitates  easy  monitoring  of  conformance  to  those  specifications. 


4.5  Summary 

The  performance  measures  for  three  monitoring  viewpoints,  identified  in 
Chapter  2,  and  their  applicable  control  charts  were  applied  to  a  case  study  network.  For 
some  measures,  more  than  one  chart  type  was  applied  and  evaluated  for  appropriateness. 
The  EXCEL  spreadsheets  and  macros  containing  the  control  chart  procedures  in 
Chapter  3  provided  the  means  for  accomplishing  this  evaluation.  Of  all  the  measures 
evaluated,  many  seem  to  be  useful  indicators  of  network  performance  that  can  be 
calculated  from  the  data  currently  being  collected  from  the  network  (the  log  of  failure  and 
repair  times).  The  overall  performance  measures:  the  number  of  down  links  (DwnLnk), 
the  proportion  of  down  links  (p-down),  and  the  proportion  of  up  links  (p-up)  all  seem  to 
be  excellent  indieators  of  the  network’s  status  and  performance  along  with  the  fact  that 
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the  data  for  them  is  easy  to  collect.  All  that  is  needed  is  a  count  of  links  up  or  down. 

Only  one  of  these  measures  needs  to  be  monitored  though  since  they  are  all  so  closed 
related.  The  choice  is  up  to  the  network  monitors  and  what  makes  the  most  sense  to 
them.  The  (s-t)  measure  p-path  is  highly  variable,  and  proved  to  be  quite  volatile  if  the 
‘right  combination  of  links’  were  to  fail  at  once.  But  then  again,  these  are  the  assignable 
causes  that  are  of  interest.  In  using  this  measure  though,  one  must  beware  that  there  may 
be  false  alarms  to  an  out-of-control  condition  from  the  inherent  variability  of  the  network 
in  relation  to  the  paths. 

Finally,  the  individual  link  measures  of  Availability,  p-link,  and  TTF,  TTR,  and 
TBF  all  seem  valuable.  From  the  individual  level  of  these  measures,  a  problem  link  can 
be  monitored  on  its  own  to  reveal  its  individual  assignable  causes  for  being  out-of¬ 
control.  Also  the  relationships  between  TTF,  TBF,  and  TTR  can  be  extremely  useful  in 
pinpointing  where  the  cause  of  an  out-of-control  condition  is  coming  from  (i.e. 
maintenance  problems,  degradation  problems,  etc.). 

In  addition,  two  important  issues  were  identified  relating  to  proper  control  chart 
construction.  These  were  proper  data  collection  rate  and  proper  subgrouping  of  this  data. 
If  these  two  procedures  are  done  incorrectly,  the  control  chart  will  not  provide  useful 
information. 
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5.  Conclusions  and  Recommendations 


5.1  Thesis  Objectives 

The  first  primary  objective  of  this  research  was  to  identify  and  evaluate  possible 
statistical  process  control  methods  (primarily  control  charts)  that  could  be  used  to 
proactively  monitor  communication  network  performance  over  time.  This  was 
accomplished  through  a  review  and  evaluation  of  previous  work  in  this  field  in  Chapter  2. 
From  this  review,  several  possible  performance  measures  were  investigated  to  represent 
the  reliability,  availability,  and  degradation  of  the  sponsor’s  communication  network  over 
time.  An  explicit  degradation  performance  measure  was  not  chosen  since  it  can  be 
readily  monitored  through  the  other  performance  measures.  These  measures  were  chosen 
on  the  basis  of  their  computability  from  directly  observable  network  data. 

Next,  control  charts  were  deemed  the  most  appropriate  SPC  technique  for 
monitoring  the  chosen  the  performance  measures.  Appropriate  ‘candidate’  control  charts 
were  identified  for  each  performance  measure,  and  proper  procedures  for  constructing  the 
charts  was  discussed.  A  case  study  was  then  undertaken  to  demonstrate  control  charting 
techniques  as  well  as  appropriateness  of  the  proposed  performance  measures. 

During  the  case  study,  a  computer  simulation  was  created  to  generate  data 
representative  of  the  observable  data  from  the  sponsor’s  communication  network.  This 
simulation  model  was  based  on  a  queueing  model  which  described  the  theoretical 
expected  performance  of  the  network.  These  theoretical  insights  were  then  used  to 
develop  ‘standards’  for  some  of  the  performance  measures’  control  charts.  Other 
theoretical  properties,  for  example,  the  binomial  distribution,  were  also  used  to  develop 
‘standards’  for  yet  more  of  the  measures’  control  charts.  In  fact,  only  the  standards  of 
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one  measure,  the  proportion  of  operating  paths  (p-path),  needed  to  be  estimated.  An 
investigation  of  each  measure  and  its  control  charts  provided  insights  on  the  ‘best’ 
measures  and  their  corresponding  ‘best’  monitoring  techniques  (i.e.,  control  charts).  In 
addition,  degradation  monitoring  was  demonstrated  for  each  appropriate  measure. 

In  accordance  with  the  second  primary  objective,  ‘to  automate  the  best  of  the 
identified  SPC  methods  into  a  user  fnendly  software  package.’  and  in  order  to  complete 
the  above  analysis,  the  identified  ‘best’  control  charting  techniques  were  incorporated 
into  a  software  package  of  EXCEL  (8)  spreadsheets  and  macros. 

Finally,  as  required  by  the  third  primary  objective  to  relate  these  SPC  methods  to 
LOS  Agreements,  methods  were  discussed  on  possible  uses  of  control  charting 
techniques  to  establish  and  monitor  LOS  Agreements  and  their  specifications. 

Overall,  various  performance  measures  are  available  even  from  the  limited  data 
assumed  to  be  observable  from  the  network.  The  choice  of  which  measure  to  use 
depends  on  which  of  them  is  most  understandable  to  the  network’s  controllers.  This 
choice  also  depends  on  any  specific  desires  or  concerns  the  network  controllers  have 
about  the  network  (i.e.,  monitoring  on  an  overall  network  level  is  preferred  to  monitoring 
on  a  link  level  since  such  low-level  monitoring  of  the  links  has  been  deemed 
unnecessary).  Alternately,  if  there  is  a  suspected  problem  on  a  certain  portion  of  the 
network,  but  the  cause  is  imknown,  this  specific  part  of  the  network  (i.e.,  a  specific  link 
or  links)  may  be  all  that  is  desired  to  be  monitored.  The  choice  here  is  user  and  network 
‘need  dependent.’ 


5.2  Recommendations 

The  performance  measures  identified  in  this  research  and  the  control  charting 
techniques  demonstrated  are  uniquely  applicable  to  the  sponsor’s  need  to  “proactively 
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monitor  the  reliability,  availability,  and  degradation  of  networks  no  matter  what  the 
end  desired  result  is  (i.e.,  LOS  Agreements,  regular  monitoring,  etc.).  Control  charting  is 
a  straightforward  method  of  real-time  (or  near  real-time)  monitoring  applicable  to  many 
different  systems  and  this  study  provides  the  sponsor  a  new  means  with  which  to  assure 
the  LOS  Agreements  are  fulfilled. 

Future  research  is  recommended  on  CUSUM  and  EWMA  charts  for  their  greater 
sensitivity  to  small  shifts  in  a  system  if  this  is  a  concern  for  the  sponsor’s  network.  Also, 
if  any  new  data  from  the  communication  network  becomes  available,  such  as  time  delay 
or  bit  error  rate,  the  control  chart  procedures  can  be  applied  to  them  by  following  the 
proper  procedures  demonstrated.  In  addition,  as  this  thesis  effort  was  just  ending,  a  new 
research  effort  was  published  by  Buchsbaum  and  Mihail  (6).  In  their  paper  they  propose 
a  heuristic  based  on  Monte  Carlo  and  Markov  simulation  techniques  in  order  to 
approximate  various  reliability  parameters  of  communication  networks  with  link  failures 
(6: 1 17).  It  is  recommended  that  this  new  research  be  investigated  for  possible  application 
to  the  sponsor’s  network.  Time  did  not  permit  any  investigation  or  application  of  this 
paper  in  the  current  research  effort. 
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APPENDIX  F:  Input  Network  File  and  Path  Enumeration  Output 


Input  Network  Description  File  -  Case  Study  Network 
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10  11  10 

1123  11 
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12  16  19 
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12  20  23 
12  21  24 
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13  23  26 
13  24  27 
13  25  28 
13  26  29 
13  27  30 
13  28  31 
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14  23  33 
14  24  34 
14  25  35 


Number  of  nodes 
Number  of  links 

Origin  node  /  Destination  node  /  Link  number 
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14  26  36 
14  27  37 
14  28  38 
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15  31  40 

15  32  41 

16  30  42 
16  31  43 
16  32  44 
16  33  45 

16  36  46 

17  30  47 
17  33  48 

17  34  49 

18  38  50 

19  39  51 

20  35  52 

21  36  53 

22  37  54 

23  41  61 

24  41  62 

25  41  63 

26  41  64 

27  41  65 

28  41  66 

29  41  67 

30  41  68 
3141  69 

32  41  70 

33  41  71 

34  41  72 

35  41  73 

36  41  74 

37  41  75 

38  41  76 

39  41  77 
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Factors  for  constructing  variables  controi  cliarts 


APPENDIX  B:  Steady-State  Equations  for  an  M/M/s  Queueing  Model 


M/M/s  Model  with  finite  calling  population  (N) 

Assumes:  All  mterarrival  times  are  iid  exponential  with  mean  rate  X 

All  service(repair)  times  are  iid  exponential  with  mean  rate  (i 

N  s  link  population 

n  =  number  of  links  in  the  queueing  system  (number  of  links  down)  =  0, 1, 2,..„N 
s  =  number  of  servers  (repairmen)  =  N  ( in  this  case) 

The  software  package  MATHCAD  Version  5.0  Plus  was  used  to  solve  the  steady-state 
equations  for  the  M/M/s  queueing  model.  The  known  parameters  are: 


N  :=number  of.links  in  network  n:=0..N  u:= - 

754 


The  steady-state  equations  are  ( for  N  =  s  ): 
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where  P^  is  the  probability  of  being  in  state  n  (n  links  are  down),  and 


N 
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where  X-bar  is  the  average  arrival  rate  to  the  queue  in  the  long  run,  and 
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where  is  the  expected  queue  length  (zero  in  this  case  since  N  =  s)  and  L  is  the  expected 
number  of  links  in  the  queueing  system.  Solving  the  X-bar  equation  for  X,  substituting  in  i 
P„'s  and  N,  and  then  resolving  the  equation  for  X  allows  A,  to  be  found.  However,  once  the 

have  been  substituted,  resolving  for  X  was  a  task  accomplished  by  MATHCAD  and  the 
resulting  equation  is  too  large  to  show  on  a  single  page. 


Here  the  X-bar  equation  is  solved  for  X: 

N 

n=0 


X bar*N-A-Po-KN  -  1  )-X-Pj  +  (N  -  2)a-P2  -h  (N  -  Sl-X-Pj  + ...  +  X  P^_  ^ 


-X 


bar 


-N.P„-(N-l).Pj-(N-2)Pj-(N-3).P3-...-Pj,_j 


H  -(N-rtP 

n  =  0 


At  this  point  to  solve  for  the  value  of  X,  the  P^'s  are  substituted  in  as  well  as  the  values  for 
and  pi,  and  MATHCAD's  symbolic  processor  takes  over  by  expanding  the  summations  and 
solving  for  X. 
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APPENDIX  C:  M/M/s  Queue  Steady-State  Results  /  End-of-Simulation  Results 

for  Validation  Network 


M/M/s  Model  with  finite  calling  population  (N  =  5) 

|i  =  1/120  seconds 

N  =  5 
n  If 
s=  N  =  5 

The  equations  for  the  P„'s  in  Appendix  B  are  expanded  for  substitution  into  the  X-bar  equ£ 


-  -  -  » 


1 


5!  X\ 


n  =  0 


(5-n)!-ii! 


I,  M 

I1-I-5— +10— +-10 — +5 —  +  — 

II  2  3  4  5 1 

^  n  n.  (l 


“  (5-n)!.ii!  W  ® 


5! 


(5-  1)!-1! 
N! 


■  \ 

1 

.pyv 

p  - 

X  1' 

.(n) 

^  (5-2)!-2! 

.W. 

■  X  ■ 

3 

.P  « 

P  -  N!  . 

■  X  ■ 

(N-4)!-4! 

.(n). 

•Po 


-  N!  . 

■  X  ■ 

^  (N-5)!-5! 

.(H). 

•*'0 
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The  Pn's  and  |i  can  now  be  substituted  into  the  X  equation  for  MATHCAD  to  solve: 


.  l  _ '^bar _ 

“  120  (.5.P„-4.P.-3.P,-2.P,-P.) 

^725 

Using  the  derived  value  of  A,,  the  numerical  vaues  for  L  and  the  P^’s  can  now  be  found: 


N:=5 


N-  1 
L:=  2 
ns=0 


1- 


N-  1 


n=0 


L  =  0.7100591716 


This  value  for  L  is  compared  to  the  end-of-simulation  (10  months  is  used  here )  average 
number  of  links  down  shown  in  the  SLAM  H  Summary  Report  as'  a  'Statistic  for 
Hme-Persistent  Variable' ,  LnksDwn  (this  report  is  attached  at  the  end  of  this  Appendix). 

The  mean  value  for  LnksDwn  =  0.708.  The  resulting  difference  from  L  is  approximately  0 
links.  Hence  these  numbers  are  in  agreement 

In  addition,  the  statistics  for  the  Mean  Time  to  Failure  for  all  links  1-5  were  collected  and 
compared  to  the  theoretical  1/A,  =  725  seconds  for  agreement: 
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Standard  Deviation 


Mean  Value 
MTrF_Linkl  :=727 
MTEF_Link2  :=718 
MTTF.LinkS  :=731 
MTrF_Link4  ;=726 
MTTF_Link5  :=730 


Difference  from  UX  =  725 


Difif  1 

:=2 

SD.MTIP.Linkl  :=726 

Diff2 

:=7 

SD_MTTF_Link2  :=718 

Diff3 

:=6 

SD_MTrF_Link3  :=736 

Diff4 

:=1 

SD_MTIF_Link4  :=729 

Dififj 

:=5 

SD.MTTF.LinkS  :=741 

Since  the  largest  difference  among  these  five  sample  links  is  7  seconds  and  the  theoretical  t 
value  is  7  seconds,  two  orders  of  magnitude  larger,  these  number  also  agree.  In  addition,  t 
standard  deviations  of  the  MTTFs  are  on  the  same  order  of  magnitude  as  the  mean  values 
supporting  the  exponential  distribution  assumption  in  Appendix  B. 

Finally,  the  values  for  the  P„'s  are  calculated  to  show  that  the  initial  conditions  of  no  links 
failed,  Pg,  is  at  least  on  the  same  order  of  probabilities  of  all  the  other  states,  and,  in  fact,  it 
the  most  probable  state: 

P„  =0.4649502527 
Pj  =0.384786416 
Pj  =0.1273775722 

P3  =0.0210831844 
P^  =0.0017448153 
Pj  =0.0000577594 
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SLAM  II  SUMMARY  REPORT 


SIMULATION  PROJECT  NET  BY  BORGIA 

DATE  11/30/1994  RUN  NUMBER  1  OF  1 

CURRENT  TIME  .2592E+08 

STATISTICAL  ARRAYS  CLEARED  AT  TIME  .OOOOE+00 

‘‘STATISTICS  FOR  VARIABLES  BASED  ON  OBSERVATION“ 

MEAN  STANDARD  COEFF.  OF  MINIMUM  MAXIMUM  NO.OF 
VALUE  DEVIATION  VARIATION  VALUE  VALUE  OBS 

MTTF_ALL  .I69E+03  .169E+03  .lOOE+01  .OOOE+00  .192E+04  **** 
MTBFl  .846E+03  .735E+03  .869E+00  .200E+0I  .679E+04  *♦♦♦ 

MTTFl  .727E+03  .726E+03  .lOOE+OI  .OOOE+00  .673E+04  ““ 

MTTRl  .119E+03  .118E+03  .991E+00  .OOOE+00  .105E+04  ***♦ 

MTTR_INST  .120E+03  .686E+00  .57IE-02  .I15E+03  .235E+03 
MTTF2  .7I8E+03  .718E+03  .IOOE+01  .0OOE+OO  .717E+04  ““ 

MTTR2  .120E+03  .I21E+03  .IOlE+01  .OOOE+OO  .142E+04  ***♦ 

MTTF3  .73IE+03  .736E+03  .lOIE+01  .OOOE+OO  .713E+04  ““ 

MTTR3  .120E+03  .120E+03  .996E+00  .OOOE+OO  .143E+04 

MTTF4  .726E+03  .729E+03  .lOOE+01  .OOOE+OO  .83SE+04  ““ 

MTTR4  .119E+03  .I19E+03  .998E+00  .OOOE+OO  .14IE+04  ♦♦♦* 

MTTF5  .730E+03  .741E+03  .IOlE+01  .OOOE+OO  .856E+04  ““ 

MTTR5  .120E+03  .118E+03  .989E+00  .OOOE+OO  .1I7E+04 

MTBF2  .838E+03  .728E+03  .869E+00  .300E+0I  .762E+04  ♦♦♦* 

MTBF3  .85IE+03  .745E+03  .875E+00  .200E+0I  .725E+04  **** 

MTBF4  .845E+03  .737E+03  .873E+00  .400E+0I  .837E+04 

MTBF5  .850E+03  .749E+03  .882E+00  .300E+0I  .876E+04 


“STATISTICS  FOR  TIME-PERSISTENT  VARIABLES“ 

MEAN  STANDARD  MINIMUM  MAXIMUM  TIME  CURRENT 
VALUE  DEVIATION  VALUE  VALUE  INTERVAL  VALUE 

LNKSUP  4.292  .837  .00  5.00““*“**  4.00 

LNKSDWN  .708  .837  .00  5.00  *********  1.00 


**REGULAR  ACTIVITY  STATISTICS** 

ACTIVITY  AVERAGE  STANDARD  MAXIMUM  CURRENT  ENTITY 
INDEX/LABEL  UTILIZATION  DEVIATION  UTIL  UTIL  COUNT 
1 

11  .7068  .8393  5  1  153217 
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APPENDIX  D:  M/M/s  Queue  Steady-State  Results  /  End-of-Simulation  Results 

for  Case  Study  Network 


M/M/s  Model  with  finite  calling  popukdon  (N  =  771 
|i  =  1/754  seconds 
N  =  77 

n  =  0, 1, 2,...,77 
s  =  N  =  77 

The  equations  for  the  P^'s  in  Appendix  B  are  expanded  and  substituted  into  the  A,-bar 
equation  as  was  shown  in  Appendix  C  for  the  small  validation  network. 


P„*- 


77!  X 


77 


n  =  0 


•  - 


77!  X\ 


(77-  ii)!.ii!  \ji/ 


(77-n)!.n! 


77 


■to-  S  (T7-»)-VP. 
n=0 


» 


-X 


bar 


77 

n=0 


So  then  for : 


1 

12259 
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Using  the  derived  value  of  X,  the  numerical  vaues  for  L  and  the  P„'s  can  now  be  found: 


N:=77 


X:=- 


12259 


N-l 

N-l 

1 

M 

n  =  0 

i  n=0  / 

L =4.461538 


This  value  for  L  is  compared  to  the  end-of-simulation  (2  months  is  used  here )  average  nun 
of  links  down  shown  in  the  SLAM  n  Summary  Report  as  a  'Statistic  for  Time-Persistent 
Variable',  LnksDwn  (this  report  is  attached  at  the  end  of  the  Appendix). 

The  mean  value  for  LnksDwn  =  4.4325.  The  resulting  difference  from  L  is  approximately 
0.029  links.  Hence  these  numbers  are  in  agreement 

In  addition,  the  statistics  for  the  Mean  Time  to  Failure  for  links  2-5  and  16  were  collected 
(chosen  arbitrarily)  and  are  compared  to  the  theoretical  lA  =  12259  seconds  for  agreemen 


Mean  Value 

Difference  from  UX  =  12259 

Standard  Deviation 

MTrF_Linkl6  :=  12200 

Diff  161=59 

SD_MTrF_Linkl6  :=  12400 

MnF_Link2  :=  12800 

Difif2:=541 

SD_MTn’_Link2  :=  13800 

MTIF_Link3  :=  12000 

Diff3  :=259 

SDJvnTF_Liiik3  :=  12300 

MTrF_Link4:=  12500 

DifiF4  :=241 

SD_MTTF_Unk4  :=  12500 

MTrF_Unk5  :=  12200 

Diff5  :=59 

SD_MTrF_Link5  :=  12300 
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Since  the  largest  difference  among  these  five  sample  links  is  541  seconds  and  the  theoredct 
mean  value  is  12259  seconds,  two  orders  of  magnitude  larger,  these  number  also  agree.  In 
addition,  the  standard  deviations  of  the  MTTF's  are  on  the  same  order  of  magnitude  as  the 
mean  values  supporting  the  exponential  distribution  assumption  in  Appendix  B. 

Finally,  the  values  for  the  P„'s  are  calculated  to  show  that  the  initial  conditions  of  no  links 
failed,  Pq,  is  on  the  same  order  of  probabilities  of  the  other  most  probable  states: 


Pg  =0.010092 

Pjg=  0.008577 

Pj  =0.047795 

Pjj  =0.003213 

Pj  =0.111708 

Pj2  =0.001087 

Pj  =0.171767 

Pj3  =0.000334 

P^  =0.195446 

P„  =0.000094 

Pj  =0.175508 

P,  5=0.000024  .. 

Pg  =0.129537 

Pjg  =8.919699*  lO'’ 

P^  =0.080811 

Pg  =0.043491 

Pg  =0.020508 

II  II 

o  o 

• 

• 

• 
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SLAM  II  SUMMARY  REPORT 


SIMULATION  PROJECT  NET  BY  BORGIA 

DATE  4/ 1/1995  RUN  NUMBER  1  OF  1 

CURRENT  TIME  .5 1 84E+07 

STATISTICAL  ARRAYS  CLEARED  AT  TIME  .OOOOE+00 


“STATISTICS  FOR  VARIABLES  BASED  ON  OBSERVATION** 

MEAN  STANDARD  COEFF.  OF  MINIMUM  MAXIMUM  NO.OF 

VALUE  DEVIATION  VARIATION  VALUE  VALUE  OBS 

MTTF_ALL  .170E+03  .170E+03  .998E+00  .OOOE+00  .I92E+04  **♦* 

MTBFI6  .I30E+05  .I24E+05  .955E+00  .630E+02  .839E+05  400 

MTTFI6  .I22E+05  .124E+0S  .lOlE+01  .306E+OI  .838E+05  400 

MTTR16  .740E+03  .834E+03  .II3E+01  .lOOE+01  .604E+04  400 

MTTR_INST  .759E+03  .I06E+02  .139E-01  .293E+03  .827E+03  ***♦ 

MTTF2  .128E+05  .138E+0S  .108E+01  .303E+02  .980E+05  382 

MTTR2  .781E+03  .799E+03  .102E+01  .lOOE+01  .557E+04  382 

MTTF3  .I20E+05  .123E+05  .103E+01  .380E+02  .989E+05  407 

MTTR3  .722E+03  .741E-H)3  .103E+0I  .500E+00  .530E+04  407 

MTTF4  .I25E+05  .125E+05  .100E+0I  .258E+02  .88IE+05  391 

MTTR4  .750E-H)3  .733E+03  .977E+00  .125E+01  .438E+04  391 

MTTF5  .122E+05  .123E+05  .101E+01  .113E+02  .842E+05  398 

MTTR5  .769E+03  .754E+03  .981E+00  .319E+01  .383E+04  398 

MTBF2  .136E+05  .I38E+05  .102E+01  .245E+03  .985E+05  382 

MTBF3  .127E+05  .I23E+05  .971E+00  .880E+02  .990E+05  407 

MTBF4  .I32E+05  .125E+05  .946E+00  .156E+03  .892E+05  391 

MTBF5  .130E+05  .123E+05  .948E-K)0  .444E+03  .855E+05  398 

LKS_UP_300  .726E+02  .211E+01  .291E-01  .630E+02  .770E-K)2  ***♦ 
LKS_DWN_300  .445E+01  .211E+01  .475E+00  .OOOE+00  .140E+02 
PATHS_DWN_300  .622E+02  .410E-K)2  .660E+00  .OOOE+00  .194E+03 


**STATISTICS  FOR  TIME-PERSISTENT  VARIABLES** 

MEAN  STANDARD  MINIMUM  MAXIMUM  TIME  CURRENT 
VALUE  DEVIATION  VALUE  VALUE  INTERVAL  VALUE 

LNKSUP  72.550  2.II1  62.00  77.00  **♦♦*♦***  72.00 

LNKSDWN  4.450  2.111  .00  iSM  *********  5.00 


**REGULAR  ACTIVITY  STATISTICS** 

ACTIVITY  AVERAGE  STANDARD  MAXIMUM  CURRENT  ENTITY 
INDEX/LABEL  UTILIZATION  DEVIATION  UTIL  UTIL  COUNT 

I  4.4504  2.1114  15  5  30498 
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APPENDIX  E:  SLAM  II  and  FORTRAN  Simulation  Code 
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SLAM  II  Network  Model  -  Graphic  Representation  -  Page  1 


XX(21)  MTBFl 


E-2 


SLAM  II  Network  Model  -  Graphic  Representation  -  Page  2 


XX(26)  MTTFl  1  V- - ^4  XX(31)  MTTRl 


SLAM  II  Network  Model  -  Graphic  Representation  -  Page  3 


A 
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SLAM  II  Network  Model  -  Graphic  Representation  -  Page  4 


SLAM  II  Network  Model  -  Statement  Representation 


GEN, BORGIA, NET, 4/1/1995, l,Y,Y,YA^,Y,Y/l, 72; 
LIMITS„4,100; 

ARRAY(1,22); 

ARRAY(2,22); 

ARRAY(3,22); 

ARRAY(4,22); 

ARRAY(5,22); 

ARRAY(6,22); 

ARRAY(7,22); 

ARRAY(8,22); 

ARRAY(9,22); 

ARRAY(10,22); 

ARRAY(11,22); 

ARRAY(12,22); 

ARRAY(13,22); 

ARRAY(14,22); 

ARRAY(15,22); 

ARRAY(16,22); 

ARRAY(17,22); 

ARRAY(18,22); 

ARRAY(19,22); 

ARRAY(20,22); 

ARRAY(21,22); 

ARRAY(22,22); 

ARRAY(23,22); 

ARRAY(24,22); 

ARRAY(25,22); 

ARRAY(26,22); 

ARRAY(27,22); 

ARRAY(28,22); 

ARRAY(29,22); 

ARRAY(30,22); 

ARRAY(31,22); 

ARRAY(32,22); 

ARRAY(33,22); 

ARRAY(34,22); 

ARRAY(35,22); 

ARRAY(36,22); 

ARRAY(37,22); 

ARRAY(38,22); 

ARRAY(39,22); 

ARRAY(40,22); 

ARRAY(41,22); 

ARRAY(42,22); 

ARRAY(43,22); 

ARRAY(44,22); 
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ARRAY(45,22); 

ARRAY(46,22); 

ARRAY(47,22); 

ARRAY(48,22); 

ARRAY(49,22); 

ARRAY(50,22); 

ARRAY(51,22); 

ARRAY(52,22); 

ARRAY(53,22); 

ARRAY(54,22); 

ARRAY(55,22); 

ARRAY(56,22); 

ARRAY(57,22); 

ARRAY(58,22); 

ARRAY(59,22); 

ARRAY(60,22); 

ARRAY(61,22); 

ARRAY(62,22); 

ARRAY(63,22); 

ARRAY(64,22); 

ARRAY(65,22); 

ARRAY(66,22); 

ARRAY(67,22); 

ARRAY(68,22); 

ARRAY(69,22); 

ARRAY(70,22); 

ARRAY(71,22); 

ARRAY(72,22); 

ARRAY(73,22); 

ARRAY(74,22); 

ARRAY(75,22); 

ARRAY(76,22); 

ARRAY(77,22); 

TIMST,XX(16),LNKSUP; 

TIMST,XX(15),LNKSDWN; 

NETWORK; 

INITIALIZE„7776000,Y; 

MONTR,SUMRY, 2592000, 2592000; 

FIN; 
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;FILE  NET.NET,  NODE  LABEL  SEED  ZAAA 
;FILE  NET.NET,  NODE  LABEL  SEED  ZAAA 

CElEATE,EXPON(  169),  1 69, 1  „  1 ; 

ACTIVITY; 

ASSIGN, XX(1)=XX(1)+1,XX(2)=ATRIB(1)-XX(18),XX(18)=ATRIB(1),1; 
ACTIVITY; 

COLCT,XX(2),MTTF_ALL„  1 ; 

ACTIVITY; 

PKLNK  ASSIGN,ATRIB(2)=UNFRM(0,77),1 ; 

ACTIVITY; 

EVENT,  1,1; 

ACTIVITY„XX(3).EQ.O; 

ACTIVITY„XX(3).EQ.2,WARN; 

ACTIVITY„XX(3).EQ.  1  ,PKLNK; 

ASSIGN, XX(15)=XX(15)+1,XX(16)=XX(16)-1,1; 

ACTIVITY„XX(20).EQ.  1 ; 

ACTIVITY„XX(20).EQ.2,ZAAG; 

ACTIVITY„XX(20).EQ.3,ZAAH; 

ACTIVITY„XX(20).EQ.4,ZAAI; 

ACTIVITY„XX(20).EQ.5,ZAAJ; 

ACTIVITY„XX(20).EQ.0,RPAI; 

COLCT,XX(21),MTBF16„l; 

ACTIVITY; 

RPAIR  ASSIGN,  ATRIB(4)=EXPON(754),  1 ; 

ACTIVITY; 

EVENT,6,1; 

ACTIVITY/1  ,ATRIB(4); 

EVENT,2,1; 

ACTIVITY„XX(20).EQ.  1 ; 

ACTIVITY„XX(20).EQ.2,ZAAC; 

ACTIVITY„XX(20).EQ.3,ZAAD; 

ACTIVITY„XX(20).EQ.4,ZAAE; 

ACTIVITY„XX(20).EQ.5,ZAAF; 

ACTIVITY„XX(20).EQ.0,ZAAB; 

COLCT,XX(26),MTTF  1 6„  1 ; 

ACTIVITY; 

COLCT,XX(3 1),MTTR1 6„  1 ; 

ACTIVITY; 

ZAAB  COLCT,XX(7),MTTR_INST„l; 

ACTIVITY; 

ASSIGN,XX(1 5)=XX(1 5)- 1  ,XX(1 6)=XX(1 6)+l ,  1 ; 

ACTIVITY; 

TERMINATE; 

ZAAC  COLCT,XX(27),MTTF2„l; 

ACTIVITY; 

COLCT,XX(32),MTTR2„  1 ; 

ACTIVITY,„ZAAB; 

ZAAD  COLCT,XX(28),MTTF3„l; 
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ACTIVITY; 

COLCT,XX(33),MTTR3„  1 ; 
ACTIVITY, „ZAAB; 

ZAAE  COLCT,XX(29),MTTF4„I; 
ACTIVITY; 

COLCT,XX(34),MTTR4„  1 ; 
ACTIVITY,„ZAAB; 

ZAAF  COLCT,XX(30),MTTF5„I; 
ACTIVITY; 

COLCT,XX(35),MTTR5„  1 ; 
ACTIVITY, „ZAAB; 

ZAAG  COLCT,XX(22),MTBF2„I; 

ACTIVITY, „RPAI; 

ZAAH  COLCT,XX(23),MTBF3„I; 

ACTIVITY,„RPAI; 

ZAAI  COLCT,XX(24),MTBF4„I; 

ACTIVITY,„RPAI; 

ZAAJ  COLCT,XX(25),MTBF5„I; 

ACTIVITY,„RPAI; 

WARN  TERMINATE; 

CREATE, 300,300,„I; 

ACTIVITY; 

EVENT,3,1; 

ACTIVITY; 

COLCT,XX(9),LKS_UP_300„I ; 
ACTIVITY; 

COLCT,XX(I0),LKS_DWN_300„  1 ; 
ACTIVITY; 

COLCT,XX(I4),PATHS_DWN_300„I; 

ACTIVITY; 

TERMINATE; 

CREATE,3600,3600,„I; 

ACTIVITY; 

EVENT,4,I; 

ACTIVITY; 

TERMINATE; 

CREATE, 86400, 86400,,,  1 ; 

ACTIVITY; 

EVENT,5,I; 

ACTIVITY; 

TERMINATE; 

END; 


FORTRAN  Insert  Code 


c 

C  SLAM  VARIABLES 

C 

C 

C  ATRIB(1)=TIME  OF  LINK  FAILURE 
C  ATRIB(2)=LINK  IDENTIFICATION  # 

C 

C 

C  XX(I)=TOTAL  LINKS  FAILED 
C  XX(2)=TIME  BETWEEN  FAILURES  (ALL  LINKS) 

C  XX(3)=BRANCHING  VARIABLE 
C  (I=chosen  link  is  up,  0=chosen  link  is  already  failed) 

C  XX(4)=TOTAL  LINKS  REPAIRED 
C  XX(5)=TOTAL  DOWN  TIME  (ALL  LINKS) 

C  XX(6)=MTTF  (CUMULATIVE  -  ALL  LINKS) 

C  XX(7)=MTTR (CUMULATIVE -ALL LINKS) 

C  XX(8)=AVAILABILITY  (CUMULATIVE  -  ALL  LINKS) 

C  (A  =  MTTF/MTTF+MTTR) 

C  XX(9)=#  UP  LINKS  /  STATE  (STATE  IS  CHECKED  EVERY  300  SECONDS) 

C  XX(I0)=#  DOWN  LINKS  /  STATE 

C  XX(I  I)=NETWORK  AVAILABILITY  (CALCULATED  EVERY  300  SECONDS) 

C  (A=P=#  UP  LINKS/TOTAL  LINKS) 

C  XX(I3)=#  UP  PATHS  FROM  (s-t)  /  STATE 
C  XX(I4)=#  DOWN  PATHS  FROM  (s-t)  /  STATE 
C  XX(I5)=#  DOWN  LINKS  -  CONTINUOUS 
C  XX(I6)=#  UP  LINKS  -  CONTINUOUS 

C  XX(I7)=NETWORK  UNAVAILABILITY  (CALCULATED  EVERY  300  SECONDS) 
C  (A=P=#  DOWN  LINKS/TOTAL  LINKS) 

C  XX(20)=BRANCHING  VARIABLE 
C  XX(2I)=MTBFLINKI 
C  XX(22)=MTBFLINK2<' 

C  XX(23)=MTBF  LINK  3 
C  XX(24)=MTBFLINK4 
C  XX(25)=MTBF  LINK  5 
C  XX(26)=MTTF  LINK  I 
C  XX(27)=MTTFLrNK2 
C  XX(28)=MTTF  LINK  3 
C  XX(29)=MTTFLINK4 
C  XX(30)=MTTF  LINK  5 
C  XX(3I)=MTTRLINK  I 
C  XX(32)=MTTRLINK2 
C  XX(33)=MTTRLINK3 
C  XX(34)=MTTRLINK4 
C  XX(35)=MTTR  LINK  5 
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c 

c 

C  ALL  ARRAYS  ARE  PER  LINK  VARIABLES 
C 

C  ARRAY(LINK  #,1)=LINK  STATUS  (1  =  up,  0  =  down) 

C  ARRAY(LINK  #,2)=TOTAL  TIME  BETWEEN  FAILUES  (TBF) 

C  ARRAY(LINK#,3)=TOTAL#  FAILURES  (CLEARED  HOURLY) 

C  ARRAY(LINK  #,4)=TOTAL  DOWN  TIME  (CLEARED  HOURLY) 

C  ARRAY(LINK  #,5)=TOTAL  "UP"  STATES  (CLEARED  HOURLY) 

C  (STATES  ARE  CHECKED  EVERY  300  SECONDS) 

C  ARRAY(LINK  #,6)=LINK  AVAILABILITY  (CLEARED  HOURLY) 

C  (A=P=UP  TIME/TOTAL  TIME) 

C  ARRAY(LINK  #,7)=LINK  AVAILABILITY  (CLEARED  HOURLY) 

C  (A=P=UP  STATES/TOTAL  STATES) 

C  ARRAY(LINK#,8)=TOTALUPTIME  (CLEARED  DAILY) 

C  ARRAY(LINK  #,9)=TOTAL  "UP"  STATES  (CLEARED  DAILY) 

C  ARRAY(LINK  #,10)=LINK  AVAILABILITY  (CLEARED  DAILY) 

C  (A=P=UP  TIME/TOTAL  TIME) 

C  ARRAY(LINK  #,  1 1)=LINK  AVAILABILITY  (CLEARED  DAILY) 

C  (A=P=UP  STATES/TOTAL  STATES) 

C  ARRA Y(LINK  #,  1 2)=TOTAL  FAILURES  (CLEARED  DAILY) 

C  ARRAY(LINK#,13)-TOTAL  DOWN  TIME  (TTR)  (CLEARED  DAILY) 
C  ARRAY(LINK#,14)=PREVIOUS  LINK  FAILURE  TIME 
C  ARRAY(LINK#,15)=MTBF 
C  ARRAY(LINK#,16)=MTTR 

C  ARRAY(LINK  #,17)=CUMULATIVE  LINK  AVAILABILITY 
C  (A=  LINK  MTTF/LINK  MTTF+LINK  MTTR) 

C  ARRAY(LINK#,18)=TOTAL#  FAILURES  (RUNNING  TOTAL) 

C  ARRAY(LINK  #,  19)=TOTAL  #  REPAIRS 
C  ARRAY(LINK  #,20)=TIME  REPAIR  IS  COMPLETED 
C  ARRAY(LINK#,21)=TOTALTTF 
C  ARRAY(LINK#,22)=MTTF 
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SUBROUTINE  INTLC 

SINCLUDE:  'PARAM.INC 

SINCLUDE:  'SCOMl.COM’ 
parameter  (maxn=50) 
parameter  (maxa=10) 
parameter  (maxp=200) 
integer  nn,na,np,num(maxn,maxn) 

integer  arc(maxn,0:maxa),path(maxp,0:maxn),LKPATH(maxp,0:maxn) 

integer  HOUR,CNT,CTR,D,DY, DA, CT,WARN,LINKS 

real  BTWF,DLYUP,LNKDWN,NUMFL,LKUP,DYUP, FAIL, DLYDWN 

common/ONE/NUMFL,LKUP,DYUP,BTWF,DLYUP,LNKDWN,FAIL,HOUR,CNT,CTR 

c,D,DY,DA,CT,DLYDWN,WARN,LlNKS 

common/network/nn,na,np,arc,path,num,LKPATH 

EXTERNAL  INPUT,ENUMPATH 

0PEN(UNIT=3  ,FILE='  1  GSHRL.OUT) 

0PEN(UNIT=5,FILE='  1  GSDLY.OUr) 

0PEN(UNIT=8,FILE='  1  LRUN.OUP) 

0PEN(UNIT=9,FILE='lLNKAV.0Ur) 

OPEN(UNIT=  1 0,FILE='  1  TBF.OUr) 

0PEN(UNIT=1  l,FILE='lSTATE.OUr) 

0PEN(UNIT=12,FILE='lMTBF.0Ur) 

OPEN(UNIT=  1 3  ,FILE='  1  TTF.OUT’) 

OPEN(UNIT=14,FILE='lTTR.OUr) 

OPEN(UNIT=15,FILE='lMTTF.OUr) 

0PEN(UNIT=1 6,FILE='  1  MTTR.OUr) 

OPEN(UNIT=  1 7,FILE='  1  REL.OUr) 

WRITE(3,600)  ’  HOURLY  CALCULATIONS  PER  LINK' 

WRITE(3,600)  'HOUR  LINK  FAILS  UP-CHECKS  UP-TIME  DOWN-TI 
cME  AVAIL-TIM  AVAIL-CONF' 

WRITE(5,600) '  DAILY  CALCULATIONS  PER  LINK' 

WRITE(5,600) 'DAY  LINK  FAILS  UP-CHECKS  UP-TIME  DOWN- 
cTIME  AVAIL-TIM  AVAIL-CONF’ 

WRITE(9,600)  'LINK  AVAILABILITY 
WRITE(  1 0,600) '  LINK  TBF' 

WRITE(  1 1 ,600) '  EVERY  300  SECOND  STATE  CHECKS’ 

WRITE(II,600)'UP-LNKS  DWN-LNKS  P_UP  =  UP-LNKS/TTL  LNKS  P_DWN  = 
cDWN-LINKS/TTL  LNKS' 

WRITE(  12,600)  ’  LINK  MTBF 

WRITE(I3,600) '  LINKTTF' 

WRITE(14,600) '  LINKTTR' 

WRITE(15,600) '  LINKMTTF’ 

WRITE(I6,600) '  LINKMTTR’ 

WRITE(  1 7,600)  'PROPORTION  OF  UP-PATHS  FOR  GIVEN  (s-t)  EVERY  300  SEC 

NUMFL=0.0 

LKUP=0.0 

DYUP=0.0 

FAIL=0.0 

HOUR=0 

DAY=0 
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CNT=0 
CTR=287 
D=0 
DY=0 
DA=0 
CT=23 
WARN=0 
DUPATH=0.0 
HUPATH=0.0 
BTWF=0.0 
DLYUP=0.0 
LNKDWN=0.0 
DLYDWN=0.0 
LINKS=77 
xx(16)=links 
DO  100  L=l, LINKS 
CALL  PUTARY(L,  1,1.0) 
100  CONTINUE 
CALL  INPUTO 
CALL  ENUMPATHO 
600  FORMAT(/lX,A/) 
RETURN 
END 


subroutine  inputQ 

c 

parameter  (maxn=50) 
parameter  (maxa=10) 
parameter  (maxp=200) 
integer  nn,na,np,num(maxn,maxn) 

integer  arc(maxn,0:maxa),path(maxp,0:maxn),LKPATH(maxp,0:maxn) 
common/network/nn,na,np,arc,path,num,LKPATH 
c 

integer  i,o,d,lknum 
c 

open(unit=20,file='net.dat') 
read(20,'(i2)')  nn 
read(20,'(i3)')  na 
do  100  i=l,nn 
arc(i,0)=0 
100  continue 
do  200  i=l,na 
read(20, 1 0)  o,d,lknum 
1 0  format(i2, 1  x,i2, 1  x,i3) 
arc(o,0)=arc(o,0)+ 1 
arc(o,arc(o,0))=d 
num(o,d)=lknum 
200  continue 
close(20) 
return 
end 
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subroutine  enumpathQ 


parameter  (maxn=50) 
parameter  (maxa=  1 0) 
parameter  (maxp=200) 
integer  nn,na,np,num(maxn,maxn) 

integer  arc(maxn,0:maxa),path(maxp,0:maxn),LKPATH(maxp,0:maxn) 
common/network/nn,na,np, arc, path, num,LKPATH 
c' 

integer  ij,flag,level,done,nodes(maxn),tree(maxn,0:maxa) 

INTEGER  N,X,Y,NLINKS,P,NODE 
external  fathom 

c 

done=0 

level=l 

np=0 

do  100  i=l,nn 
tree(i,0)=0 
nodes(i)=0 
100  continue 
tree(l,0)=arc(l,0) 
nodes(l)=l 

nodes(arc(  1  ,arc(  1 ,0)))=  1 
do  200  j=l,arc(l,0) 
tree(lj)=arc(lj) 

200  continue 

call  fathom(tree, level, nodes) 

400  continue 

if(nodes(nn).eq.l)  then 
np==np+l 

path(np,0)=level+ 1 
path(np,l)=l 
do  500  j=2,path(np,0) 
path(np  J)=tree(j- 1  ,treeG- 1 ,0)) 

500  continue 
endif 

if(tree(level,0).ne.0)  then 
nodes(tree(level,tree(level,0)))=0 
tree(level,0)=tree(level,0)- 1 
endif 

if(tree(level,0).ne,0)  then 
nodes(tree(level,tree(level,0)))=  1 
call  fathom(tree,level,nodes) 
endif 

if(tree(level,0).eq.0)  then 
flag=0 

300  continue 
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if(level.ne.l)then 

level=level-l 

nodes(tree(level,tree(level,0)))=0 
tree(level,0)=tree(level,0)- 1 
endif 

if(level.eq.l)then 

flag=l 

endif 

if(tree(level,0).ne.O)  then 
nodes(tree(level,tree(level,0)))=l 
call  fathom(tree, level, nodes) 
flag=l 
endif 

if(flag.eq.O)  go  to  300 
if(level.eq.l)then 
if(tree(level,0).eq.0)  then 
done=l 
endif 
endif 
endif 

if(done.eq.O)  go  to  400 
c  open(unit=19,file- Inpath.out') 

c  write(  1  Pj^yOepth  First  Path  Enumeration  -  By  Node  Number' 
c  do  600  i=l,np 

c  write(  1 9, 1 0)'There  are  ',path(i,0),'  nodes  in  path  number  ',i 

c  write(19,20y  ',(path(ij)j=l,path(i,0)) 
c600  continue 
c  close(19) 

BUILD  LKPATH  -  SHOWS  PATH  BY  LINK  NUMBER 
DO700P=l,np 
NLINKS=0 
N=path(P,0) 

DO  800  NODE=l,N-l 
X=path(P,NODE) 

Y=path(P,NODE+l) 

LKPATH(P,NODE)=NUM(X,Y) 

NLINKS=NLINKS+1 
800  CONTINUE 

LKPATH(P,0)=NLINKS 
700  CONTINUE 

open(unit=l  8,file='l  Ipath.out') 

write(18,*)'Depth  First  Path  Enumeration  —  By  Link  Number' 
do  900  i=l,np 

write(18,10)'There  are  ',lkpath(i,0),'  links  in  path  number  ',i 
10  format(/lx,al0,i2,a22,i4) 

write(18,20)'  ',(lkpath(ij)j=l,lkpath(i,0)) 

20  format(lx,al0,20i3//) 

900  continue 
close(18) 
return 
end 


E-14 


subroutine  fathoni(tree, level, nodes) 

c 

parameter  (maxn=50) 

parameter  (maxa=10) 

parameter  (maxp=200) 

integer  tree(maxn,0:maxa), level, nodes(maxn) 

integer  nn,na,np,num(maxn,maxn) 

integer  arc(maxn,0:maxa),path(maxp,0:maxn),LKPATH{maxp,0:maxn) 
common/network/nn,na,np, arc, path, num,LKPATH 
c 

integer  ij,flag 
c 

200  continue 
flag=0 

i=tree(level,tree(level,0)) 
if(arc(i,0).ne.0)  then 
level=level+l 
do  300  j=l,arc(i,0) 
if(nodes(arc(iJ)).eq.O)  then 
tree(le  vel,0)=tree(level,0)+ 1 
tree(level,tree(ievel,0))==arc(ij) 
flag=l 
endif 

300  continue 

if(tree(level,0).ne.0)  then 
nodes(tree(level,tree(level,0)))=l 
endif 
endif 

if(nodes(nn).eq.l)  then 
fiag=0 
endif 

if(flag.eq.l)  go  to  200 

return 

end 
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SUBROUTINE  EVENT® 

SINCLUDE:  'PARAM.INC 
SINCLUDE:  'SCOMl.COM' 
parameter  (maxn=50) 
parameter  (maxa=  10) 
parameter  (maxp=200) 
integer  nn,na,np,num(maxn,maxn) 

integer  arc(maxn,0:maxa),path(maxp,0:maxn),LKPATH(maxp,0:maxn) 
integer  LINKS,LINK,LNK,J,LK,LNKSTS,HOUR,I,L,CNT,CTR,LS,D, 
cDY,DA,CT,WARN,DWN,STS,PTH,LKS,LNKNUM 

real  LNKTIM,LNKBTW,LNKDWN3TWF,LNKUP, AVAIL, DLYUP,AVBL,AVTIM, 
cAVCONF,DOWN,NUMFL,LKUP,DYUP,FAIL,NFAIL,DLYDWN,LAST,BETW,MTTFLK, 
cMTTRLK,AVLK,N,NUMRP,TTF, BACKUP, TTR,MTBFLK,LNKTTF,REL,STATUS 
common/ONE/NUMFL,LKUP,DYUP,BTWF,DLYUP,LNKDWN,FAIL,HOUR,CNT,CTR 
c,D,DY,DA,CT,DLYDWN,WARN,LINKS 
common/network/nn,na,np,arc,path,num,LKPATH 
GOTO(l,2,3,4,5),I 

*  LINK  HAS  BEEN  CHOSEN  TO  FAIL  -  THIS  EVENT  SETS  LINK  STATUS  TO  DOWN 

1  CONTINUE 

DO  10  LINK=1,LINKS 
J=LINK-1 

IF  ((ATRIB(2).GT.J).AND.(ATRIB(2).LE.LINK))  THEN 
C 

C  IF  THE  LINK  IS  ALREADY  FAILED,  A  NEW  LINK  MUST  BE  CHOSEN 
C 

LNK=GETARY(LINK,  1) 

IF  (LNK.EQ.0.0)  THEN 
C 

C  COUNT  TOTAL  DOWN  LINKS  TO  CHECK  FOR  ALL  LINKS  DOWN  CONDITION 
C 

DO  15  L=1,LINKS 
LNKSTS=GETARY(L,1) 

IF  (LNKSTS.EQ.0.0)  CNT=CNT+1 
15  CONTINUE 

IF  (CNT.GE.LINKS)  THEN 
XX(3>=2.0 
XX(1)=XX(1)-1 

WRITE(8,600)  'WARNING  -  ALL  LINKS  HAVE  FAILED!' 

WARN=WARN+1 

*  READC»,'(A)') 

CNT=0 

GO  TO  100 
END  IF 
XX(3)=1.0 
CNT=0 
GO  TO  100 
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CHOSEN  LINK  IS  FAILED 
ELSE 

CALL  PUTARY(LINK, 1,0.0) 

LS=GETARY(LINK,1) 

CALC  TOTAL  TIME  BETWEEN  EACH  LINK'S  FAILURES  (RUNNING  TOTALS) 

LNKTIM=GETARY(LINK,14) 

LNKBTW=GETARY(LINK,2) 

BETW=ATRIB(I>LNKTIM 
LNKBTW=LNKBTW+BETW 
CALL  PUTARY(LINK,2,LNKBTW) 

WRITE  LINK’S  TIME  BETWEEN  FAILURES  TO  FILE 

WRITE(IO,606)  LINK,BETW 

COLLECT  TBF  PER  LINK  FOR  MTBF  (FOR  SIM  VALIDATION) 

IF  (LINK.EQ.16)  THEN 
XX(21)=BETW 
XX(20)=I 
ELSE 

IF  (LINK.EQ.2)  THEN 
XX(22)=BETW 
XX(20)=2 
ELSE 

IF  (LINK.EQ.3)  THEN 
XX(23)=BETW 
XX(20)=3 
ELSE 

IF  (LINK.EQ.4)  THEN 
XX(24)=BETW 
XX(20)=4 
ELSE 

IF  (LINK.EQ.5)  THEN 
XX(25)=BETW 
XX(20)=5 
ELSE 
XX(20)=0 
END  IF 
END  IF 
END  IF 
END  IF 
END  IF 
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c 

C  SAVE  LAST  FAILURE  TIME  FOR  NEXT  BETWEEN  CALC 
C 

LAST=ATRIB(1) 

CALL  PUTARY(LINK,  14, LAST) 

C 

C  COUNT  EACH  LINK'S  NUMBER  OF  FAILURES  (CLEARED  HOURLY) 

C 

N=GETARY(LINK,3) 

N=N+I 

CALL  PUTARY(LINK,3,N) 

C 

C  COUNT  EACH  LINK'S  NUMBER  OF  FAILURES  (RUNNING  TOTAL) 

C 

NUMFL=GETARY(LINK,  1 8) 

NUMFL=NUMFL+I 

CALL  PUTARY(LINK,18,NUMFL) 

C 

C  CALC  MTBF  FOR  THE  FAILING  LINK  AND  WRITE  TO  FILE  (CUMULATIVE) 
C 

MTBFLK=LNKBTW/NUMFL 
CALL  PUTARY(LINK,15,MTBFLK) 

WRITE(12,606)  LINK,MTBFLK 
C 

C  COLLECT  TOTAL  TIME  BETWEEN  ALL  LINK  FAILURES 
C 

BTWF=BTWF+XX(2) 

C 

C  RETURN  TO  NETWORK 
C 

ATRIB(3)=LINK 
XX(3)=0 
CNT=0 
GO  TO  100 
END  IF 
END  IF 

10  CONTINUE 
100  RETURN 

THE  LINK  HAS  BEEN  REPAIRED  -  THIS  EVENT  SETS  LINK  STATUS  TO  UP 

2  CONTINUE 
LK=ATRIB(3) 

CALL  PUTARY(LK,  1,1.0) 

C 

C  COUNT  NUMBER  OF  LINKS  REPAIRED 
C 

XX(4)=XX(4)+I 
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CALC  EACH  LINK'S  TOTAL  DOWN  TIME 

D0WN=TN0W-ATRIB(1) 

LNKDWN=GETARY(LK,4) 

LNKDWN=LNKDWN+DOWN 
CALL  PUTARY(LK,4,LNKDWN) 

COUNT  EACH  LINK'S  NUMBER  OF  REPAIRS  (RUNNING  TOTAL) 

NUMRP=GETARY(LK,  1 9) 

NUMRP=NUMRP+1 

CALL  PUTARY(LK,19,NUMRP) 

CALC  MTTR  FOR  THE  REPAIRED  LINK  AND  WRITE  TO  FILE  (CUMULATIVE) 

MTTRLK=LNKDWN/NUMRP 
CALL  PUTARY(LK,16,MTTRLK) 

WRITE(  16,606)  LK,MTTRLK 


COLLECT  TOTAL  DOWN/REPAIR  TIME  (ALL  LINKS) 

XX(5)=XX(5)+DOWN 

CALC  LINK'S  TIME  TO  FAILURE  AND  COLLECT  LINK'S  TOTAL  TTF  (RUNNING  TOTALS) 

BACKUP=GETARY(LK,20) 

TTF=ATRIB(1)-BACKUP 
LNKTTF=GETARY(LK,2 1 ) 

LNKTTF=LNKTTF+TTF 
CALL  PUTARY(LK,21,LNKTTF) 

SAVE  LINK'S  BACK-UP  TIME  FOR  NEXT  TTF  CALC 

CALL  PUTARY(LK,20,TNOW) 

WRITE  LINK'S  TTF  TO  FILE  AND  TTR  TO  FILE 


TTR=DOWN 
WRITE(13,606)  LK,TTF 
WRITE(  14,606)  LK,TTR 

COLLECT  TTF  PER  LINK  FOR  MTTF  AND  TTR  PER  LINK  FOR  MTTR  (FOR  SIM  VALIDATE) 

IF  (LK.EQ.16)  THEN 
XX(26)=TTF 
XX(31)=TTR 
XX(20)=1 
ELSE 
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IF  (LK.EQ.2)  THEN 
XX(27)=TTF 
XX(32)=TTR 
XX(20)=2 
ELSE 

IF(LK.EQ.3)THEN 

XX(28)=TTF 

XX(33)=TTR 

XX(20)=3 

ELSE 

IF  (LK.EQ.4)  THEN 
XX(29)=TTF 
XX(34)=TTR 
XX(20>=4 
ELSE 

IF  (LK.EQ.5)  THEN 
XX(30)=TTF 
XX(35)=TTR 
XX(20)=5 
ELSE 
XX(20)=0 
END  IF 
END  IF 
END  IF 
END  IF 
END  IF 

CALC  MTTF  FOR  THE  FAILING  LINK  AND  WRITE  TO  FILE  (CUMULATIVE) 

NUMFL=GETARY(LK,  1 8) 

MTTFLK=LNKTTF/NUMFL 
CALL  PUTARY(LK,22,MTTFLK) 

WRITE(15,606)  LK,MTTFLK 

CALC  CUMULATIVE  AVAIL  FOR  THE  REPAIRED  LINK 

AVLK=MTTFLK/(MTTFLK+MTTRLK) 

CALL  PUTARY(LK,I7,AVLK) 

WRITE(9,60I)  LK,AVLK 

CALC  MTTF  -  INSTANTANEOUS  ALL  LINKS 

XX(6)=LNKTTF/XX(I) 

CALC  MTTR  -  INSTANTANEOUS  ALL  LINKS 
XX(7)=XX(5)/XX(4) 

CALC  AVAILABILITY  USING  MTTF  AND  MTTR  -  INSTANTANEOUS  ALL  LINKS 

XX(8)=XX(6)/(XX(6)+XX(7)) 

RETURN 


E-20 


ooo  oonnnonoooonuj  non  non 


*  CHECK  STATE  EVERY  300  SECONDS 

s|i**«4>*i|i****«**4t******4>4>**4<*****'ti*********4<«****>l"l<** 

3  CONTINUE 
XX(9)=0 

DO  30  L=1,LINKS 

COUNT  TOTAL  "UP"  LINKS  (GOAL  #2) 

LNKSTS=GETARY(L,1) 

IF  (LNKSTS.EQ.l.O)  THEN 
XX(9)=XX(9)+1 

COUNT  "UP"s  FOR  P  CALC  (GOAL  #3) 

LKUP=GETARY(L,5) 

LKUP=LKUP+1 
CALL  PUTARY(L,5,LKUP) 

END  IF 
0  CONTINUE 

COLLECT  NUMBER  OF  LINKS  DOWN  AT  EACH  STATE 
XX(10)=LINKS-XX(9) 

CALC  AVAILABILITY  AS  P  =  #  LINKS  UP/TOTAL  LINKS  (GOAL  #2) 
XX(11)=XX(9)/LINKS 

CALC  AVAILABILITY  AS  P  =  #  LINKS  DOWN/TOTAL  LINKS  (GOAL  #2) 

XX(17)=XX(10)/LINKS 

WRITE  300  SEC  CHECKS  TO  FILE 

CTR=CTR+1 
IF  (CTR.EQ.288)  THEN 
D=D+1 

WRITE(1 1,605)  'DAY  =  ',D 
CTR=0 
END  IF 

WRITE(1 1,604)  XX(9),XX(10),XX(1 1),XX(17) 

COUNT  #  OPERATING  PATHS  (s-t)  (GOAL  #1) 


DWN=0 
XX(13)=0.0 
DO  35  PTH=l,np 
LKS=LKPATH(PTH,0) 
STATUS=1.0 
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DO  36  LNKNUM=1,LKS 
L=LKPATH(PTH,LNKNUM) 

STS=GETARY(L,1) 

STATUS=STATUS*STS 
IF(STATUS.EQ.0.0)  THEN 
DWN=DWN+1 
GO  TO  35 
END  IF 

36  CONTINUE 
XX(13)=XX(13)+1 
35  CONTINUE 
C 

C  CALC  RELIABILITY  AS  REL  =  #  PATHS  UP  /  TOTAL  PATHS  (GOAL  #1) 

C 

REL=XX(13)/np 
WRITE(I7,607)  REL 
XX(I4)=np-XX(13) 

C 

C  COLLECT  TOTAL  PATHS  UP  FOR  HRLY  P  CALC 
C 

HUPATH=HUPATH+XX(I3) 

RETURN 

*  HOURLY  CHECKS 

4  CONTINUE 
CT=CT+1 

IF  (CT.EQ.24)  THEN 
DA=DA+I 

WRITE(3,605)  'DAY  =  ',DA 

♦  WRITE(6,605)  'DAY  =  ',DA 
CT=0 

END  IF 

HOUR=HOUR+l 

C 

C  CALC  HOURLY  AVAILABILITY  =  LINK  UP  TIME/TOTAL  TIME  -  FOR  EACH  LINK  (GOAL  #3) 
C 

DO  40  L=l, LINKS 
LNKDWN=GETARY(L,4) 

LNKUP=3600-LNKDWN 
AVAIL=LNKUP/3600 
CALL  PUTARY(L,6, AVAIL) 

C 

C  SAVE  LINK  UP  TIME  FOR  DAILY  CALC 
C 

DLYUP=GETARY(L,8) 

DLYUP=DLYUP+LNKUP 
CALL  PUTARY(L,8,DLYUP) 
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SAVE  LINK  DOWN  TIME  FOR  DAILY  CALC 

DL  YD  WN=GETARY(L,  1 3) 

DLYDWN=DLYDWN+LNKDWN 
CALL  PUTARY(L,13,DLYDWN) 

CALC  HOURLY  AVAILABILITY  AS  P  =  TOTAL  CONFORM/TOTAL  STATES  -  EACH  LINK 


LKUP=GETARY(L,5) 

AVBL=LKUP/I2 
CALL  PUTARY(L,7,AVBL) 

SAVE  LINK  "UP"s  FOR  DAILY  CALC 

DYUP=GETARY(L,9) 

DYUP=DYUP+LKUP 
CALL  PUTARY(L,9,DYUP) 

WRITE  HOURLY  CALCS  TO  FILE 

NFAIL=GETARY(L,3) 

FAIL=FAIL+NFAIL 
CALL  PUTARY(L,I2,FAIL) 

WRITE(3,663)H0UR,L,NFAIL,LKUP,LNKUP,LNKDWN,AVBL, AVAIL 

CLEAR  EACH  LINK'S  TOTAL  FAILURES  HOURLY 

CALL  PUTARY(L,3,0.0) 

CLEAR  EACH  LINK'S  DOWN  TIME  HOURLY 

CALL  PUTARY(L, 4,0.0) 

CLEAR  EACH  LINK'S  UP  TALLY  HOURLY 

CALL  PUTARY(L,5,0.0) 

40  CONTINUE 

IF  (HOUR.EQ.24)  HOUR=0 
RETURN 
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♦  DAILY  CALCULATIONS 

^t,itH^i^i^*t*****^^******************i^<>i**^l^*^^^^**^^******* 

5  CONTINUE 
DY=DY+I 

CALC  DAILY  AVAILABILITY  =  LINK  UP  TIME/TOTAL  TIME  -  FOR  EACH  LINK  (GOAL  #3) 

DO  50  L=l, LINKS 
DLYUP=GETARY(L,8) 

AVTIM=DLYUP/86400 
CALL  PUTARY(L,10,AVTIM) 

CALC  DAILY  AVAILABILITY  AS  P  =  TOTAL  CONFORM/TOTAL  STATES  -  EACH  LINK  (G#3) 

LKUP=GETARY(L,9) 

AVCONF=LKUP/288 
CALL  PUTARY(L,1  LAVCONF) 

WRITE  DAILY  CALCS  TO  FILE 

NFAIL=GETARY(L,12) 

DLYD  WN=GETARY(L,  1 3) 

WRITE(5,608)DY,L,NFAIL,LKUP,DLYUP,DLYDWN,AVTIM,AVCONF 

CLEAR  EACH  LINK'S  DOWN  TIME  DAILY 

CALL  PUTARY(L,I3,0.0) 

CLEAR  EACH  LINK'S  UP  TIME  DAILY 

CALL  PUTARY(L, 8,0.0) 

CLEAR  EACH  LINK'S  "UP"  TALLY  DAILY 

CALL  PUTARY(L,9,0.0) 

50  CONTINUE 
C 

600  FORMAT(/IX,A/) 

601  F0RMAT(1X,I4,3X,F6.4) 

602  FORMAT(IX,A,I4,4X,A,F6.4) 

603  FORMAT(IX,I2,2X,I4,2X,FIO.  1 ,3X,F6. 1,6X,F8. 1 ,3X,F8. 1,4X,F6.4,4X,F6. 
c4) 

604  FORMAT(lX,F6.1,3X,F6.I,3X,F6.4,I8X,F6.4) 

605  FORMAT(/IX,A,I4/) 

606  FORMAT(IX,I4,lX,F8.1) 

607  FORMAT(IX,F8.1) 

608  FORMAT(IX,I4,2X,I4,2X,F  10. 1 ,3X,F6. 1 ,6X,F8. 1 ,3X,F8. 1 ,4X,F6.4,5X,F6. 
c4) 

RETURN 
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SUBROUTINE  OTPUT 
SINCLUDE:  'PARAM.INC 
SINCLUDE:  'SCOMl.COM' 

integer  HOUR,CNT,CTR,D,DY, DA, CT,L,  WARN, LINKS 
realBTWF,DLYUP,LNKDWN,NUMFL,LKUP,DYUP,FAIL,DLYDWN,FAILS 
common/ONE/NUMFL,LKUP,DYUP,BTWF,DLYUP,LNKDWN,FAlL,HOUR,CNT,CTR 
c,D,DY,DA,CT,DLYDWN,WARN,LINKS 

WRITE  NUMBER  OF  FAILURES  PER  LINK  TO  FILE 

WRITE(8,104)  'TOTAL  WARNINGS  =  '.WARN 
WRITE(8,103)  'TOTAL  LINK  FAILURES  =  ',XX(1) 

WRITE(8,100)'LINK  FAILURES' 

DO  10  L=1,LINKS 
FAILS=GETARY(L,  1 8) 

WRITE(8,102)  L,FAILS 
10  CONTINUE 

100  FORMAT(/lX,A/) 

101  FORMAT(lX,F6.1) 

102  FORMAT(1X,I4,F10.1) 

103  FORMAT(/lX,A,F12.1/) 

104  FORMAT(/lX,A,I6/) 

RETURN 

END 
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Path  Enumeration  Output  -  Case  Study  Network 


Depth  First  Path  Enumeration  -  By  Link  Number 

There  are  5  links  in  path  number  1 
60  6  9  15  77 

There  are  5  links  in  path  number  2 
60  6  9  14  76 

There  are  7  links  in  path  number  3 
60  6  9  13  25  54  75 

There  are  7  links  in  path  number  4 
60  6  9  13  24  53  74 

There  are  7  links  in  path  number  5 
60  6  9  13  23  52  73 

There  are  7  links  in  path  number  6 
60  6  9  13  22  51  77 

There  are  7  links  in  path  number  7 
60  6  9  13  21  50  76 

There  are  7  links  in  path  number  8 
60  6  9  13  20  49  72 

There  are  7  links  in  path  number  9 
60  6  9  13  20  48  71 

There  are  7  links  in  path  number  10 
60  6  9  13  20  47  68 

There  are  7  links  in  path  number  1 1 
60  6  9  13  19  46  74 

There  are  7  links  in  path  number  12 
60  6  9  13  19  45  71 

There  are  7  links  in  path  number  13 
60  6  9  13  19  44  70 

There  are  7  links  in  path  number  14 
60  6  9  13  19  43  69 

There  are  7  links  in  path  number  15 
60  6  9  13  19  42  68 
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There  are  7  links  in  path  number  16 
60  6  9  13  18  41  70 

There  are  7  links  in  path  number  17 
60  6  9  13  18  40  69 

There  are  7  links  in  path  number  18 
60  6  9  13  17  39  67 

There  are  7  links  in  path  number  19 
60  6  9  13  17  38  66 

There  are  7  links  in  path  number  20 
60  6  9  13  17  37  65 

There  are  7  links  in  path  number  21 
60  6  9  13  17  36  64 

There  are  7  links  in  path  number  22 
60  6  9  13  17  35  63 

There  are  7  links  in  path  number  23 
60  6  9  13  17  34  62 

There  are  7  links  in  path  number  24 
60  6  9  13  17  33  61 

There  are  7  links  in  path  number  25 
60  6  9  13  16  32  67 

There  are  7  links  in  path  number  26 
60  6  9  13  16  31  66 

There  are  7  links  in  path  number  27 
60  6  913  16  30  65 

There  are  7  links  in  path  niunber  28 
60  6  9  13  16  29  64 

There  are  7  links  in  path  number  29 
60  6  9  13  16  28  63 

There  are  7  links  in  path  number  30 
60  6  9  13  16  27  62 

There  are  7  links  in  path  number  3 1 
60  6  9  13  16  26  61 


There  are  5  links  in  path  number  32 
60  6  9  12  62 


There  are  5  links  in  path  number  33 
60  6  911  61 

There  are  5  links  in  path  number  34 
59  5  8  15  77 

There  are  5  links  in  path  number  35 
59  5  8  14  76 

There  are  7  links  in  path  number  36 
59  5  8  13  25  54  75 

There  are  7  links  in  path  number  37 
59  5  8  13  24  53  74 

There  are  7  links  in  path  number  38 
59  5  8  13  23  52  73 

There  are  7  links  in  path  number  39 
59  5  8  13  22  51  77 

There  are  7  links  in  path  number  40 
59  5  8  13  21  50  76 

There  are  7  links  in  path  number  41 
59  5  8  13  20  49  72 

There  are  7  links  in  path  number  42 
59  5  8  13  20  48  71 

There  are  7  links  in  path  number  43 
59  5  8  13  20  47  68 

There  are  7  links  in  path  number  44 
59  5  8  13  19  46  74 

There  are  7  links  in  path  number  45 
59  5  8  13  19  45  71 

There  are  7  links  in  path  number  46 
59  5  8  13  19  44  70 

There  are  7  links  in  path  number  47 
59  5  8  13  19  43  69 

There  are  7  links  in  path  number  48 
59  5  8  13  19  42  68 


There  are  7  links  in  path  number  49 
59  5  8  13  18  41  70 


There  are  7  links  in  path  number  50 
59  5  8  13  18  40  69 

There  are  7  links  in  path  number  5 1 
59  5  8  13  17  39  67 

There  are  7  links  in  path  number  52 
59  5  8  13  17  38  66 

There  are  7  links  in  path  number  53 
59  5  8  13  17  37  65 

There  are  7  links  in  path  number  54 
59  5  8  13  17  36  64 

There  are  7  links  in  path  number  55 
59  5  8  13  17  35  63 

There  are  7  links  in  path  number  56 
59  5  8  13  17  34  62 

There  are  7  links  in  path  number  57 
59  5  8  13  17  33  61 

There  are  7  links  in  path  number  58 
59  5  8  13  16  32  67 

There  are  7  links  in  path  number  59 
59  5  8  13  16  31  66 

There  are  7  links  in  path  number  60 
59  5  8  13  16  30  65 

There  are  7  links  in  path  number  61 
59  5  8  13  16  29  64 

There  are  7  links  in  path  number  62 
59  5  8  13  1628  63 

There  are  7  links  in  path  number  63 
59  5  8  13  16  27  62 

There  are  7  links  in  path  number  64 
59  5  8  13  16  26  61 

There  are  5  links  in  path  number  65 
59  5  8  12  62 


There  are  5  links  in  path  number  66 
59  5  8  11  61 


There  are  4  links  in  path  number  67 
58  4  15  77 

There  are  4  links  in  path  number  68 
58  4  14  76 

There  are  6  links  in  path  number  69 
58  4  13  25  54  75 

There  are  6  links  in  path  number  70 
58  4  13  24  53  74 

There  are  6  links  in  path  number  71 
58  4  13  23  52  73 

There  are  6  links  in  path  number  72 
58  4  13  22  51  77 

There  are  6  links  in  path  number  73 
58  4  13  21  50  76 

There  are  6  links  in  path  number  74 
58  4  13  20  49  72 

There  are  6  links  in  path  number  75 
58  4  13  20  48  71 

There  are  6  links  in  path  number  76 
58  4  13  20  47  68 

There  are  6  links  in  path  number  77 
58  4  13  19  46  74 

There  are  6  links  in  path  number  78 
58  4  13  19  45  71 

There  are  6  links  in  path  number  79 
58  4  13  1944  70 

There  are  6  links  in  path  number  80 
58  4  13  19  43  69 

There  are  6  links  in  path  number  81 
58  4  13  19  42  68 

There  are  6  links  in  path  number  82 
58  4  13  18  41  70 
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There  are  6  links  in  path  number  83 
58  4  13  18  40  69 


There  are  6  links  in  path  number  84 
58  4  13  17  39  67 

There  are  6  links  in  path  number  85 
58  4  13  17  38  66 

There  are  6  links  in  path  number  86 
58  4  13  17  37  65 

There  are  6  links  in  path  number  87 
58  4  13  17  36  64 

There  are  6  links  in  path  number  88 
58  4  13  17  35  63 

There  are  6  links  in  path  number  89 
58  4  13  17  34  62 

There  are  6  links  in  path  number  90 
58  4  13  17  33  61 

There  are  6  links  in  path  number  91 
58  4  13  16  32  67 

There  are  6  links  in  path  number  92 
58  4  13  16  31  66 

There  are  6  links  in  path  number  93 
58  4  13  16  30  65 

There  are  6  links  in  path  number  94 
58  4  13  16  29  64 

There  are  6  links  in  path  number  95 
58  4  13  16  28  63 

There  are  6  links  in  path  number  96 
58  4  13  16  27  62 

There  are  6  links  in  path  number  97 
58  4  13  16  26  61 

There  are  4  links  in  path  number  98 
58  4  12  62 

There  are  4  links  in  path  number  99 
58  4  11  61 


There  are  6  links  in  path  number  100 
57  3  7  10  15  77 


There  are  6  links  in  path  number  101 
57  3  7  10  14  76 

There  are  8  links  in  path  number  102 
57  3  7  10  13  25  54  75 

There  are  8  links  in  path  number  103 
57  3  7  10  13  24  53  74 

There  are  8  links  in  path  number  104 
57  3  7  10  13  23  52  73 

There  are  8  links  in  path  number  105 
57  3  7  10  13  22  51  77 

There  are  8  links  in  path  number  106 
57  3  7  10  13  21  50  76 

There  are  8  links  in  path  number  107 
57  3  7  10  13  20  49  72 

There  are  8  links  in  path  number  108 
57  3  7  10  13  20  48  71 

There  are  8  links  in  path  number  109 
57  3  7  10  13  20  47  68 

There  are  8  links  in  path  number  1 10 
57  3  7  10  13  1946  74 

There  are  8  links  in  path  number  1 1 1 
57  3  7  10  13  19  45  71 

There  are  8  links  in  path  number  1 12 
57  3  7  10  13  19  44  70 

There  are  8  links  in  path  number  113 
57  3  7  10  13  19  43  69 

There  are  8  links  in  path  number  1 14 
57  3  7  10  13  19  42  68 

There  are  8  links  in  path  number  1 15 
57  3  7  10  13  18  41  70 

There  are  8  links  in  path  number  1 16 
57  3  7  10  13  18  40  69 
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There  are  8  links  in  path  number  117 
57  3  7  10  13  17  39  67 


There  are  8  links  in  path  number  118 
57  3  7  10  13  17  38  66 

There  are  8  links  in  path  number  1 19 
57  3  7  10  13  17  37  65 

There  are  8  links  in  path  number  120 
57  3  7  10  13  17  36  64 

There  are  8  links  in  path  number  121 
57  3  7  10  13  17  35  63 

There  are  8  links  in  path  number  122 
57  3  7  10  13  17  34  62 

There  are  8  links  in  path  number  123 
57  3  7  10  13  17  33  61 

There  are  8  links  in  path  number  124 
57  3  7  10  13  16  32  67 

There  are  8  links  in  path  number  125 
57  3  7  10  13  16  31  66 

There  are  8  links  in  path  number  126 
57  3  7  10  13  16  30  65 

There  are  8  links  in  path  number  127 
57  3  7  10  13  16  29  64 

There  are  8  links  in  path  number  128 
57  3  7  10  13  16  28  63 

There  are  8  links  in  path  number  129 
57  3  7  10  13  16  27  62 

There  are  8  links  in  path  number  130 
57  3  7  10  13  16  26  61 

There  are  6  links  in  path  number  13 1 
57  3  7  10  12  62 

There  are  6  links  in  path  number  132 
57  3  7  10  11  61 

There  are  4  links  in  path  number  133 
56  2  15  77 


There  are  4  links  in  path  number  134 
56  2  14  76 


There  are  6  links  in  path  number  135 
56  2  13  25  54  75 

There  are  6  links  in  path  number  136 
56  2  13  24  53  74 

There  are  6  links  in  path  number  137 
56  2  13  23  52  73 

There  are  6  links  in  path  number  138 
56  2  13  22  51  77 

There  are  6  links  in  path  number  139 
56  2  13  21  50  76 

There  are  6  links  in  path  number  140 
56  2  13  20  49  72 

There  are  6  links  in  path  niunber  141 
56  2  13  20  48  71 

There  are  6  links  in  path  number  142 
56  2  13  20  47  68 

There  are  6  links  in  path  number  143 
56  2  13  19  46  74 

There  are  6  links  in  path  number  144 
56  2  13  19  45  71 

There  are  6  links  in  path  number  145 
56  2  13  19  44  70 

There  are  6  links  in  path  number  146 
56  2  13  19  43  69 

There  are  6  links  in  path  number  147 
56  2  13  1942  68 

There  are  6  links  in  path  munber  148 
56  2  13  18  41  70 

There  are  6  links  in  path  number  149 
56  2  13  18  40  69 

There  are  6  links  in  path  number  150 
56  2  13  17  39  67 


There  are  6  links  in  path  number  151 
56  2  13  17  38  66 


There  are  6  links  in  path  number  152 
56  2  13  17  37  65 

There  are  6  links  in  path  number  153 
56  2  13  17  36  64 

There  are  6  links  in  path  number  154 
56  2  13  17  35  63 

There  are  6  links  in  path  number  155 
56  2  13  17  34  62 

There  are  6  links  in  path  number  156 
56  2  13  17  33  61 

There  are  6  links  in  path  number  157 
56  2  13  16  32  67 

There  are  6  links  in  path  number  158 
56  2  13  16  31  66 

There  are  6  links  in  path  number  159 
56  2  13  16  30  65 

There  are  6  links  in  path  number  160 
56  2  13  16  29  64 

There  are  6  links  in  path  number  161 
56  2  13  16  28  63 

There  are  6  links  in  path  number  162 
56  2  13  16  27  62 

There  are  6  links  in  path  number  163 
56  2  13  16  26  61 

There  are  4  links  in  path  number  164 
56  2  12  62 

There  are  4  links  in  path  number  165 
56  2  11  61 

There  are  4  links  in  path  number  166 
55  1  15  77 

There  are  4  links  in  path  number  167 
55  1  14  76 


There  are  6  links  in  path  number  168 
55  1  13  25  54  75 


There  are  6  links  in  path  number  169 
55  1  13  24  53  74 

There  are  6  links  in  path  number  170 
55  1  13  23  52  73 

There  are  6  links  in  path  number  171 
55  1  13  22  51  77 

There  are  6  links  in  path  number  172 
55  1  13  21  50  76 

There  are  6  links  in  path  number  173 
55  1  13  20  49  72 

There  are  6  links  in  path  number  174 
55  1  13  20  48  71 

There  are  6  links  in  path  number  175 
55  1  13  20  47  68 

There  are  6  links  in  path  niunber  176 
55  1  13  19  46  74 

There  are  6  links  in  path  number  177 
55  1  13  19  45  71 

There  are  6  links  in  path  number  178 
55  1  13  19  44  70 

There  are  6  links  in  path  number  179 
55  1  13  19  43  69 

There  are  6  links  in  path  number  180 
55  1  13  19  42  68 

There  are  6  links  in  path  number  1 8 1 
55  1  13  18  41  70 

There  are  6  links  in  path  munber  1 82 
55  1  13  18  40  69 

There  are  6  links  in  path  number  183 
55  1  13  17  39  67 

There  are  6  links  in  path  number  184 
55  1  13  17  38  66 


There  are  6  links  in  path  number  185 
55  1  13  17  37  65 


There  are  6  links  in  path  number  186 
55  1  13  17  36  64 

There  are  6  links  in  path  number  187 
55  1  13  17  35  63 

There  are  6  links  in  path  number  188 
55  1  13  17  34  62 

There  are  6  links  in  path  number  189 
55  1  13  17  33  61 

There  are  6  links  in  path  number  190 
55  1  13  16  32  67 

There  are  6  links  in  path  number  191 
55  1  13  16  31  66 

There  are  6  links  in  path  number  192 
55  1  13  16  30  65 

There  are  6  links  in  path  number  193 
55  1  13  16  29  64 

There  are  6  links  in  path  number  194 
55  1  13  16  28  63 

There  are  6  links  in  path  number  195 
55  1  13  16  27  62 

There  are  6  links  in  path  number  196 
55  1  13  16  26  61 

There  are  4  links  in  path  number  197 
55  1  12  62 

There  are  4  links  in  path  number  198 
55  1  1161 


APPENDIX  G:  How  to  Use  The  EXCEL  Control  Charting  Spreadsheets 


There  are  two  spreadsheets  for  each  type  of  control  chart.  The  first  one  computes 
the  control  limits  and  the  second  constructs  the  control  chart.  Spreadsheets  were  written 
for  four  types  of  control  charts:  p  chart,  np  chart,  XmR  chart,  and  x-bar  and  R  chart. 
The  use  of  each  is  discussed  below.  Upon  opening  any  of  the  spreadsheets,  it  is  wise  to 
immediately  use  the  ‘Save  as’  command  to  save  it  to  a  working  filename.  This  will  help 
with  the  prevention  of  corrupting  the  original  files.  The  chart  spreadsheets  are  designed 
to  hold  up  to  720  data  points,  hence  the  storage  size  of  a  spreadsheet  can  be  large  when 
full.  It  is  recommended  that  a  minimum  of  8  MB  of  RAM  are  available,  especially  if 
more  than  one  spreadsheet  is  open  at  a  time.  Save  data  often  in  case  your  RAM 
limitations  are  exceeded. 

p  Chart 

p  Control  Limits  rplim.xls^ 

The  control  limits  can  be  computed  by  either  estimating  standards  from  inputted 
sample  data,  or  by  simply  inputting  the  theoretical  standards  if  they  are  known.  If  the 
theoretical  standards  are  known,  input  them  into  the  second  cell  under  the  headings 
‘pbar’  (mean)  and  ‘std  dev’  (standard  deviation).  Notice  this  row  is  labeled 
‘Theoretical’  to  the  far  right.  The  control  limits  will  appear  under  their  appropriate 
column  headings  of  ‘LCL’,  ‘CL’,  and  ‘UCL’;  the  Lower  Control  Limit,  Center  Line, 
and  Upper  Control  Limit  respectively.  Also,  under  the  headings  below  the  control  limits 
will  appear  the  1 -sigma  and  2-sigma  lower  and  upper  warning  limits  for  optional  usage 
on  the  control  charts.  A  sample  ‘plim.xls’  spreadsheet  is  shown  next  for  reference: 
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iPerfMeas 

SampleNum 

p  Value 

Out-o1-Controi? 

pbar 

calc 

LCL(p) 

CL(p) 

UCL(p) 

lUtllJlIJi 

1 

0.987 

0.9420132 

0.0266347 

0.8621091 

0.9420132 

1 

Estimated 

'HHHi 

2 

0.961 

0.9421 

0.0266 

0.8623 

0.9421 

1 

Theoretical 

3 

0.961 

77 

4 

0.961 

S 

6 

0.9481 

L_1  SIGMA 

UJ  SIGMA 

L_2SIGMA 

U_2SIGMA 

7 

0.9351 

0.9153785 

0.9686479 

0.8887438 

0.9952826 

Estimated 

8 

0.9221 

— _ _ i 

0.9155 

0.9687 

0.8889 

0.9953 

Theoretical 

9 

0.9481 

10 

0.9481 

111 

12 

0.9481 

13 

14 

0.8961 

15 

Sample  ‘plim.xls’  Spreadsheet  for  p  Chart  Control  Limits 


If  the  standards  need  to  be  estimated,  the  sample  data  on  the  desired  p  value 
should  be  inputted  under  the  column  heading  ‘p  Value’.  A  column  is  provided  to  enter 
the  sample  number  if  desired.  The  sample  size  used  to  compute  the  sample  p  values  must 
be  inputted  under  the  heading  ‘Smpl  Size’  to  the  far  left.  Also  located  here  at  the  very 
top  left  of  the  spreadsheet  is  a  cell  to  place  the  name  of  the  performance  measure  being 
used  if  desired.  Once  again,  the  control  limits  will  appear  under  their  appropriate 
headings  in  the  first  row  labeled  ‘Estimated’  on  the  far  right.  The  estimated  1 -sigma  and 
2-sigma  lower  and  upper  warning  limits  will  also  appear  under  their  appropriate  headings 
in  the  row  labeled  ‘Estimated.’  Additionally,  there  is  a  colunrn  with  the  heading  ‘Out-of- 
Control?’  in  the  middle  of  the  spreadsheet  The  formula  contained  in  the  first  cell  under 
this  heading  can  be  copied  and  pasted  down  this  column  for  each  of  the  entered  p  values. 
This  will  indicate  if  any  of  the  sample  p  values  are  out-of-control  with  respect  to  the 
estimated  control  limits.  Any  indicated  out-of-control  p  values  can  then  be  investigated 
for  exclusion  from  the  control  limits  computations,  (see  Section  3.1.5  Trial  Control 
Limits) 
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p  Control  Chart  fp.xls^ 

Upon  opening  ‘p.xls’  EXCEL  will  ask  ‘  This  document  contains  links. 

Reestablish  links?’  Answer  No.  When  the  blue  interface  screen  appears,  save  the  file  as  a 
working  file.  On  the  interface  screen,  there  are  three  buttons  for:  (1)  Select  a 
Performance  Measure  (or  inserting  a  new  one),  (2),  Edit  the  Control  (and  Warning) 

Limits  and  (3)  Delete  a  Performance  Measure.  For  starting  a  new  file,  either  (1)  or  (2) 
above  can  be  accomplished  first. 

For  (2),  Edit  the  Control  (and  Warning)  Limits,  simply  press  the  button  and 
enter  the  control  and  warning  limits  as  previously  calculated  using  ‘plim.xls’  in  their 
appropriate  box.  The  control  and  warning  limits  entered  here  will  affect  the  charts  of  all 
performance  measures  in  the  entire  spreadsheet.  However  they  can  be  edited  at  any  time. 
Warning  limits  are  optional  since  zeroes  entered  in  the  upper  1  sigma  and/or  upper  2- 
sigma  limits  boxes,  results  in  no  plotting  of  the  respective  warning  limits. 

For  (1),  Select  a  Performance  Measure,  again  press  the  button  and  enter  a  new 
performance  measure  or  select  from  a  drop-down  list  of  existing  ones  (the  name 
NewMeas  has  no  data  associated  with  it).  After  pressing  OK,  EXCEL  will  ask, 
‘Selection  too  big.  Continue  without  undo?’  Answer  YES.  A  spreadsheet  will  appear 
identified  by  the  performance  measure  name  entered.  Data  can  now  be  entered  one  point 
at  a  time  using  the  gray  button  labeled  ‘Enter  Data’  to  the  right,  or  it  can  be  pasted  in 
from  another  existing  spreadsheet.  To  paste  in  a  block  of  data,  the  performance  measure 
sheet  must  be  xmprotected.  Use  the  ‘Tools,  Protection,  Unprotect  Sheet’  command 
from  the  pull-down  menus  to  accomplish  this.  The  p  values  can  be  pasted  into  the 
column  labeled  ‘p  value.’  The  column  labeled  ‘Smpl  Number’  must  be  filled  for  each  p 
value  in  this  spreadsheet  (as  opposed  to  its  optional  status  in  ‘plim.xls’).  This  can 
quickly  be  accomplished  using  the  ‘Edit,  Fill,  Series’  command  from  the  pull-down 
menus.  Next,  the  box  labeled  ‘Total  Samples  Available’  must  be  entered  and  the  ‘Last’ 
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point  desired  to  be  plotted  must  also  be  entered  (be  sure  to  either  press  the  enter  key  or 
click  on  another  cell  in  the  worksheet  after  entering  the  ‘Last’  point  or  the  ‘Plot’  button 
will  not  work.  Now  select  the  ‘Plot’  button.  The  chart  is  automatically  plotted.  Once 
this  is  complete,  there  may  be  a  diagonal  line  of  points  on  the  chart.  To  remedy  this, 
select  the  chart  by  clicking  it  once,  and  then  select  the  ‘Chart  Wizard’  button  on  the 
toolbar.  A  box  will  appear  for  ‘Chart  Wizard  -  Step  1  of  2,’  select  the  ‘Next’  button. 
Another  box  will  appear  for  ‘Chart  Wizard  -  Step  2  of  2,’  in  the  box  labeled  ‘Use  First 
(0)  Columns  for  X  Data’  make  sure  there  is  a  ‘  1  ’,  and  in  the  box  labeled  ‘Use  First  (0) 
Rows  for  Legend  Text’  make  sure  there  is  also  a  ‘  1’.  This  should  clear  up  any  errors  in 
the  appearance  of  the  chart.  The  chart  is  now  finished.  The  blue  interface  screen  can  be 
selected  again  for  editing  the  control  limits  or  entering/selecting  other  performance 
measures  as  desired.  There  is  a  pull-down  menu  labeled  ‘Control  Charts’  at  the  top  of 
the  screen  for  returning  to  the  interface  screen.  Any  editing  of  the  control  or  warning 
limits  is  shown  immediately  on  the  currently  plotted  chart.  If  more  data  point  are  added 
to  a  performance  measure,  make  sure  the  ‘Total  Szimples  Available’  zind  ‘Last’  point 
boxes  are  correctly  altered  and  then  select  the  ‘Plot’  button.  The  spreadsheet  is  equipped 
to  handle  up  to  720  data  points  as  currently  written. 

For  (3),  Delete  a  Performance  Measure,  press  the  button  and  select  the  desired 
performance  measure  fi’om  a  drop-down  list  of  existing  ones  and  select  OK. 


np  Chart 

np  Control  Limits  tnplim.xls^ 

The  ‘nplim.xls’  spreadsheet  is  operated  the  same  as  ‘plim.xls’  above  except  that 
np  values  must  be  entered  instead  of  p  values. 
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The  ‘np.xls’  spreadsheet  is  operated  the  same  as  ‘p.xls’  above  except  that  np 


values  must  be  entered  instead  of  p  values. 

XmR  Charts 

XmR  Control  Limits  (xmrlim.xls^ 

The  ‘xmrlim.xls’  spreadsheet  is  operated  the  same  as  ‘plim.xls’  above  with  the 
following  exceptions; 

For  theoretical  limits  calculation,  the  known  standards  must  be  inputted  into  the 
second  cell  under  the  headings  ‘Xbar’  (mean)  and  ‘mRbar’  (standard  deviation).  Note 
that  the  standard  deviation  {not  the  theoretical  mRbar)  must  be  inputted  into  the  second 
cell  under  mRbar.  Notice  again  this  row  is  labeled  ‘Theoretical’  to  the  far  right.  The 
control  limits  will  appear  under  their  appropriate  column  headings  of  ‘LCL’,  ‘CL’,  and 
‘UCL’  for  both  the  X  and  mR  charts  as  will  the  appropriate  warning  limits.  See  the 
sample  ‘xmrlim.xls’  spreadsheet  below  for  reference. 


cznirHi 

lliliLiil-LM 

Xbv 

LCLIXI 

UCL(inR) 

filTTTTn 

1 

1 

WWi'^lVllA 

0.91606955 

*i'  iiiiiii 

8.014466005 

1.334404774 

4.359794425 

3 

2 

4.46 

0 

10.6082 

2.3117232 

p— 

3 

0 

n» 

HHHE 

■HKl 

3 

0 

01- 

0 

6 

6 

3 

HHEXilill 

U_2SIGMA(X1 

6 

4 

2 

03- 

HHHEl 

2.099152283 

6.831403263 

Estimalod 

7 

6 

1 

04- 

■■Em] 

2.4106 

0.3612 

8.5588 

Theoraiical 

8 

6 

1 

d2« 

— lEO 

pgMt!yrrm;y 

8 

4 

2 

d3- 

1  1  *  I*  ^ 

10 

4 

0 
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4.0508614 

0 

6.8079006 

11 

2 

2 
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12 

4 

2 
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13 

7 

3 

14 

a 

1 

15 

6 
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Sample  ‘xmrlim.xls’  Spreadsheet  for  XmR  Chart  Control  Limits 
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For  estimating  control  limits  from  sample  data,  the  individual  values  must  be 
inputted  in  the  column  labeled  ‘Value,’  the  SampleNum  column  is  optional,  and  now  in 
addition  to  copying  and  pasting  the  Out-of-Control?  formula  down  its  column,  the 
‘mRange’  formula  located  in  the  second  cell  underneath  its  heading  must  also  be  copied 
and  pasted  for  each  sample  value  underneath 

XmR  Control  Charts  fxmr.xlsl 

The  ‘xmr.xls’  spreadsheet  is  operated  the  same  as  ‘p.xls’  above  except  that 
individual  values  must  be  entered  instead  of  p  values,  and  there  is  a  separate  button  on  the 
blue  interface  screen  for  entering  the  warning  limits. 


x-bar  and  R  Charts 

x-bar  and  R  Control  Limits  txbrlim.xls) 

The  ‘xbrlim.xls’  spreadsheet  is  operated  the  same  as  ‘xmlim.xls’  above  with  the 
following  exceptions: 

For  theoretical  limits  calculation,  the  known  standards  must  be  inputted  into  the 
second  cell  under  the  headings  ‘Xbar_bar’  (mean)  and  ‘Rbar’  (standard  deviation). 

Note  again  that  the  standard  deviation  {not  the  theoretical  Rbar)  must  be  inputted  into  the 
second  cell  under  Rbar.  Notice  again  this  row  is  labeled  ‘Theoretical’  to  the  far  right. 
This  spreadsheet  is  set  up  for  a  sample  size  of  24.  If  a  different  sample  size  is  desired,  the 
desired  sample  size  must  be  entered  into  the  cell  below  the  label  ‘Sample  Size’  on  the  far 
left  and  the  corresponding  tabulated  constants  for  that  sample  size  must  be  entered  in 
their  respective  cells  under  the  ‘Rbar’  column.  The  control  limits  and  warning  limits  will 
appear  under  their  appropriate  column  headings  for  both  the  x-bar  and  R  charts.  See  the 
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sample  ‘xbrlim.xls’  spreadsheet  below  for  reference. 


PerfMeas 
DnLk  1hr d 


SampleNum  |  Value  jSmpI  Max  |Smpl  Min  |Xbar 


_ 1 


Sample  ‘xbrlim.xls’  Spreadsheet  for  x-bar  and  R  Chart  Control  Limits 
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For  estimating  control  limits  from  sample  data,  the  individual  values  must  be 
inputted  in  the  column  labeled  ‘Value,’  and  the  SampleNum  column  is  again  optional. 

The  desired  sample  size  must  now  be  inputted  into  the  cell  below  the  label  ‘Sample  Size’ 
and  in  addition,  some  formula  copying  must  be  accomplished.  This  formula  copying  will 
be  explained  using  the  default  value  of  sample  size  24.  The  entire  block  of  cells  under  the 
headings  ‘SmpI  Max’,  ‘Smpl  Min’,  ‘Xbar’,  and  ^Range’  from  the  first  cell  under  the 
headings  to  the  last  cell  of  the  sample  (i.e.  the  first  to  the  24th  cell  under  the  headings  - 
note  that  this  includes  mostly  empty  cells)  must  be  copied  and  then  pasted  down  the 
column  directly  below  for  all  of  the  sample  data.  This  is  a  hashed  area  on  the  sample 
spreadsheet.  After  pasting,  select  the  entire  area  under  the  two  headings,  ‘Xbar’  and 
‘Range’.  From  the  pull-down  menus  select  ‘Edit,  Go  To,  Special,  Formulas,  OK.’  The 
cells  containing  the  sample  means  and  ranges  will  now  be  selected.  Select  copy  and  then 
paste  these  values  under  the  headings,  ‘SmpIXbar’  and  ‘SmplRange.’  Copy  and  paste 
the  Out-of-Control?  formula  down  its  column  as  described  earlier.  The  control  and 
warning  limits  will  appear  under  their  appropriate  column  headings  for  both  the  x-bar  (X) 
and  R  charts  in  the  row  labeled  ‘Theoretical.’ 

x-bar  and  R  Control  Charts  (xbnxls) 

The  ‘xbr.xls’  spreadsheet  is  operated  the  same  as  ‘xmr.xls’  above  except  that 
sample  values  must  be  entered  instead  of  individual  values. 
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